How a new article is curated in TAIR
We have come a long way in understanding gene function in plants since the first Arabidopsis mutant studies in the 1970s. The continuous global effort to elucidate the function of each gene in the Arabidopsis genome has resulted so far in 34,268 papers linked to 24,810 loci in TAIR. If you were ever curious how this information got into TAIR, keep reading for a glimpse behind the scenes.
The curation process at TAIR starts each week with an automated text search for “Arabidopsis” in the title, abstract or MESH terms in NCBI-PubMed papers that were published in the previous week. All abstracts matching this search are downloaded into our curation database. Then, an automated search for existing and potential new gene names links genes to papers. These automated associations are then validated by a curator who, when necessary, creates new gene symbols and adds them to the database. In cases where no gene symbol is mentioned in the abstract, a curator checks the full text for gene names and symbols that can be linked to an existing gene in TAIR. As a result of this linking process, users can retrieve a verified set of literature available for a given locus from TAIR.
All initially curated papers are ranked by gene novelty and amount of information currently available in the TAIR database so that the majority of the curation effort goes towards filling in the blanks on the Arabidopsis genome. In the next step in the curation process, the curator reads the full text of the highest priority papers and adds relevant information to the TAIR database. Information captured comprises gene description, mutant alleles, germplasm and observed phenotypes as well as experimental results and their derived Gene Ontology (GO) and Plant Ontology (PO) annotations. The GO and PO annotations capture information on gene function and gene expression. Authors have the opportunity to provide GO/PO annotations and summary information through our online submission software, TOAST (toast.arabidopsis.org). TAIR curators review all author submissions before incorporation. All information entered in the TAIR database becomes publicly available each Saturday (US Pacific time) on the TAIR website. If you have a subscription or are under the number of free page accesses to TAIR for the month, you can see these new data.
With every new batch of recently published articles the TAIR data become more complete and refined. As we fill in the blanks on the Arabidopsis genome, we provide future studies the foundation for new discoveries, which will in turn end up in TAIR filling more blanks. So keep those papers coming and submit your data so that we can continue to build our community resource!