Get all participants on the same page, provide background and impetus for this project
Agenda
Introductions: name, institution, interest in this effort, relevant expertise (15 mins)
Tanya - very brief history, overview of current motivation, TAIR's efforts since Araport11 release (10 mins)
Françoise/NCBI team member - overview of NCBI Eukaryotic Genome Annotation pipeline using the initial run with Naish T2T genome as example (15 mins)
Korbinian - overview of Col-CC (community consensus) assembly progress so far (15 mins)
...
Who would handle submission to Genbank and how can we best prepare for a smooth submission?
Schedule follow up meetings for subgroups (assembly, manual review, other)
Summary
General enthusiasm for the need and utility of a reannotation.
Proposed timeline: 12 calendar months to set up the framework, process, teams to get V12 released.
Funding: No dedicated, separately-sourced funding for any particular group at this time. Interested groups will contribute expertise and/or infrastructure.
Assembly
need to work out details of tracking the metadata on BioSample provenance for the individual pieces
K. Schneebeger's group's work on assembling a Col-Community Consensus (CC) assembly is likely to finish by the end of 2022, and will incorporate C. Pikaard's group's data on NOR2 and NOR4, 4 Col-0 MA lines from F. Rabanal/D. Weigel
Col-CC should be submitted to NCBI as an independent assembly
Idea to visualize the multiple individual assemblies that were combined to make Col-CC as a patchwork (GCV? other visualization tool?)
Automated Annotation
NCBI will take the Col-CC assembly when accepted by NCBI and available and will run it through their eukaryotic annotation pipeline
need to resolve details on whether or not to include the Araport11 proteins as evidence
add isoSeq from PRJNA755474 from this paper to next run
please send more recent isoSeq/RNA-seq/CAGE experimental data in GenBank to include in the next run
Manual Review
TAIR to investigate hosting requirements/existing training tools, ease of output of information needed for NCBI submission even before manual review begins
used by many MODs to maintain their genomes, concurrent editing possible, community maintained code
TAIR as coordinator
Klass van Wijk: anything to do with proteins (including small peptides - sORFs, etc) and protein isoforms (AS, etc)
Kai Ye : We (XJTU team) would work on centromeres and microsatellite sites.
Shujun Ou, Alex Bousios: TEs, ATHILAs
Craig Pikaard: NOR2 and NOR4, rDNAs
WebApollo as tool
Community experts
Submission to NCBI/GenBank
begin working on release early, no need to wait till manual review is done, can be done with dummy data to work out format issues
Dissemination
broad support for authorship on V12 paper for ALL who were involved in effort, in any stage of the process
V12 release to be incorporated into TAIR, BAR, etc as soon as possible after NCBI RefSeq is updated to this version
Action items
We'll check in by email in mid-December to get an update from Korbinian and from TAIR on the assembly progress and WebApollo.