Data Submission

Guidelines for various datatypes that can be submitted to TAIR follow below. Please contact curator@arabidopsis.org if you have any questions or to submit your completed files for processing.

External Links

We encourage users of TAIR to submit external links associated to data objects found in TAIR. Submitted links will be added to the external link band on the object detail pages. Data objects accepted include: Loci, Clones, Genes, Polymorphisms, Genetic Markers, and Clone Ends. The locus example below would create a link on this locus detail page.

NOTE: If you want to link to ALL Arabidopsis LOCI and are using the AGI code (e.g. AT4G32520) as part of the variable for the URL, just send the External website name, base URL, and variable syntax to: curator@arabidopsis.org

If you would like to submit links for a subset of all the data objects found in TAIR, please submit your data in one of the following ways:

Option 1) Use a preformatted Excel spreadsheet. To submit your data, please download and complete the following Excel spreadsheet. Note that examples and instructions have been included. Before submitting your data, review your entries to ensure that the data is correct.

Note: Macros must be enabled for this form to work properly. To allow the macros in this form to run, please change your macro security level to medium (recommended) or low. From the Tools menu, choose Macro, then Security. After you change the security level to medium or low, you will have to restart Excel.

Download: external_link_data_form.xls

Option 2) Send tab-delimited data from any program. If you want to create your own file please follow the following format in submitting your data and include the column headers as the first line in your files.

FieldDescriptionExample
Object NameThe name of the object found in TAIR to be linkedAT4g32520 or ATDMC1.1
Object TypeType of object. Select from: Locus, Clone, Gene, Polymorphism, GeneticMarkerLocus or GeneticMarker
External Web Site - NameName of the website that the Base URL refers toView AraCyc information
External Link - Base URLBase rule for the URL. This is usually the portion of the external link URL before the question mark (?)http://www.arabidopsis.org: 1555/ARA/NEW-IMAGE
External Link - VariableVariable for the URL. This is usually the portion including and following the question mark (?)?type=REACTION-IN-PATHWAY& object=GLYOHMETRANS-RXN
Display NameName of the link to be displayed. For example, if the link is for an associated marker, you can provide the marker name.formylTHF biosynthesis

Gene Class Symbol Registration

In order to register a Gene Class Symbol, you must be registered at TAIR.



These symbols typically consist of 3 or 4 letters that define either a single gene (ABC) or a gene family (ABC1, ABC2, ABC3). Although a number of classical genetic loci were assigned 2-letter symbols years ago, the continued use of 2-letter symbols to name new loci is strongly discouraged except in cases where there is a compelling reason based on the underlying science. A similar justification should be provided for the use of gene class symbols with more than 4 letters. Symbols may describe a mutant phenotype or some aspect of gene structure or function.

Please use ALL UPPERCASE letters for the Mutant Symbols (ex: EMBRYONIC LETHAL) and all lowercase letters for Gene Product Symbols (except when referring to domains or other symbols; examples: chloroplast J-like domain; TON1 recruiting motif)

This registration process is designed to minimize accidental duplication in gene nomenclature by comparing the symbol being registered with both the Registered Symbol list and searching TAIR to see if the symbol is unregistered but already published. If the symbol is either registered or published the registration will not be accepted.

What Constitutes a Gene Class Symbol?

Gene class symbols have been divided into two major categories (mutant phenotype and gene product) to facilitate curation. Mutant phenotype symbols should be used when a mutant is available and the symbol describes some aspect of the phenotype. Gene product symbols should be used regardless of the availability of a mutant when the symbol describes some aspect of gene structure or function. The use of organism specific prefixes such as At or Ath is discouraged as this is redundant and leads to a lot of genes named 'Arabidopsis thaliana X'.

Examples of Mutant Phenotype Symbols:

    • ABA:   ABA DEFICIENT
    • AGE:   AUXIN-RESPONSIVE GENE EXPRESSION
    • EMB:   EMBRYO DEFECTIVE

Examples of Gene Product Symbols:

    • ADH:   alcohol dehydrogenase
    • AGL:   AGAMOUS-like
    • COR:   cold regulated

More Nomenclature documentation can be found here.

Gene Function Data

I think the functional annotation for a gene is incorrect. How can I correct it?

If you have data from a published article indicating that information about the expression, localization or biological role of a gene is incorrect, please contact TAIR. An example of an incorrect annotation would be if an article shows a gene to be expressed in the roots (but not shoots) and that same article is used as evidence for gene expression in shoots. If you have data that contradict other published works, we strongly encourage you to submit your own functional annotations and have them included in the gene record.

I know of published information on gene function that is missing from TAIR. How can I include it?

Please provide the missing gene function information using our online submission tool, GOAT. This form can be used by anyone to submit published data, whether you are an author of the publication or not. You will need to log in before filling out the form so that we can associate your name to the submission. If you experience difficulty or have information that will not fit within the form, please contact us.

More detailed instructions:

Authors are encouraged to submit their gene function data to TAIR at the time of publication. We also welcome submission of data from older articles by any community member whether or not you are an author on the article. Gene function data accepted by this form include molecular function (for example, protein kinase), localization (cellular, sub-cellular or gross anatomy), biological role (for example, seed development), interacting partners, and comments (gene summaries.) If you have other types of data to submit please choose the appropriate form from our Submit Overview page.

All submissions will be reviewed by a curator before making the data public and will not be released until the relevant publication is published.

The online submission form has been tested with Chrome and Firefox. Please use one of those browsers.

To use our Generic Online Annotation Tool (GOAT), you must have an ORCiD. If you have not already done so, please create one at the ORCID website. GOAT is a ‘paper based’ curation form, used for curating experimental gene function from a published, peer reviewed research article, so you will also need to have the PubMed ID or Digital Object Identifier for your paper handy. When you click on the button below, GOAT will start and prompt you for your ORCiD and password. Using GOAT, you can enter multiple loci, each with its separate set of functional annotations, and submit them all at once to the TAIR curators.

To get started click the link below and follow these steps. Or watch a You Tube tutorial first



  1. Log in with your ORCiD
  2. Click on the Submission link in the upper left menubar to start a new submission or continue one in progress
  3. Enter in the DOI or Pubmed ID for the paper you are curating
  4. Enter in ID's for every locus you are curating from the paper. GOAT accepts UniProt IDs, AGI Locus IDs and RNA central IDs
  5. Add your annotations. Choose the type of annotation from the drop down menu. You can choose to make Gene Ontology annotations (cellular component, molecular function, biological process), Plant Ontology annotations (plant developmental stages or anatomy), protein interactions or comments
  6. Review your annotations before you submit
  7. After you complete your submission, TAIR curators will review the information. If we have any questions, we will get in touch with you

We also provide a preformatted Excel spreadsheet. Please click on the link to download the spreadsheet. Required fields are blue . Fields with an asterisk(*) can have more than one entry separated by a pipe with no intervening spaces (e.g. Ecker,Joseph|Bell,Callum).  If submitting a manuscript to The Plant Journal, please include this file in your TPJ submission as a supplemental file.

File Format

FieldDescriptionValues/ConstraintsExamples
Contact Person/
Submitter
*
Name of the person we can contact if we have questions about the annotationnoneEva Huala
Locus Identifier AGI locus id or name of genetic locusAt#g#####, see AGI coding convention if the gene is new, splitting or merging existing genesAAAP or At1g10010
Gene Namesymbol-based name of gene, if this existsSee nomenclature guidelinesETO1
Reference *PubMedID or citationPubMed ID can be obtained from TAIR Publication Search or PubMedPubMedID:14576282;
Smith (2006) Science 123:23;
DOI:10.1111/j.1365-313x. 2010.04399.x; submitted to Plant Journal
Gene function, process, location or interacting partner functional annotation that you'd like to makeFor interacting partners, please use the AGI code (At1g01010.1) in addition to the gene nameser/thr kinase, outer mitochondrial membrane, anther, gynoecium morphogenesis, At1g01010.1(ABC1)
Method descriptionMethod and evidence used to support the functional annotationA list of method descriptions is given as the second sheet of the Excel workbookenzyme assay, mutant phenotype, transcript levels, yeast two-hybrid-assay

Gene Structure Additions/Modifications

tl;dr Email us. (smile)


I have identified a new gene not in the current genome release. How can I add that information to TAIR?

Please contact TAIR to request a new locus identifier (i.e. AGI Locus code). Once you have the new identifier please contact TAIR to provide any information you have about the function or expression pattern of the new gene. If you are registered at TAIR, you can submit functional annotation information directly or register a new gene symbol. Registration is free.

I think the structure of a gene in the current annotation is incorrect. How can I correct it?

If you have found missing information or discrepancies in the existing Col-0 gene structures, we would like to include your gene model in our database.

Please submit your data in one of the following ways:

  1. Send tab-delimited data from any program. If you want to create your own file please follow the following format in submitting your data and send the files to: curator@arabidopsis.org. Fields indicated in bold are required. Please include the column headers as the first line in your files.

Description file

FieldDescriptionValues/ConstraintsExample
Chromosome based name
At#g#####, see AGI coding convention if the gene is new, splitting or merging existing genesAT1G23450
Gene namegene symbolic name and/or full namesymbolic names and full names should be listed in one column and separated with a colonAG:AGAMOUS
Genomic coordinateschromosome number, start and stop coordinates on the genome, if you have mapped them
Chr1:nnnnnn..nnnnnnn
Transcript Sequence

gaacaacattgagaagtcatgtaatgt
Transcript sequence type
cDNA, EST, RNAseq, other
Protein SequenceThe protein sequence will help us determine the correct translational start and stoponly for protein-coding genesMGLVNEVELKSLLEQETDSP
GenBank accession

AAF79505
Method descriptionmethod used to derive the structural annotation
full-length cDNA clone sequencing
PublicationPMID or DOI


Marker and Polymorphism Data

If you are submitting more than 150 entries, please e-mail curator@arabidopsis.org for additional instructions otherwise please submit your data by using the following preformatted spreadsheet:

Download: marker_polymorphism_data_form.xls

Instructions on how to use the preformatted spreadsheet:

If available, always indicate the digest pattern/PCR length/base pair of Col-0 as a reference (not necessary for deletions/insertions).

Examples:

Marker NameMarker TypePolymorphism TypePolymorphismRestriction SiteEcotype
MARKER1CAPSdigest_pattern400;2301Col-0
MARKER1CAPSdigest_pattern6300Hi-0






MARKER2SNPlexSNPG
Col-0
MARKER2SNPlexSNPA
Hi-0

In the case where a marker or polymorphism is shared by several ecotypes, please list all ecotypes in the same row, as shown in this example:

Marker NameMarker TypePolymorphism TypePolymorphismEcotype
MARKER3SSLPPCR_product_length165Col-0;Gr-6;Gu-0
MARKER3SSLPPCR_product_length150RLD;Wil-1;Wt-5

Fields indicated in blue are required. Fields marked with an asterisk (*) can contain multiple entries but these must be separated by a semicolon with no intervening spaces (e.g. Col-0;Ler-0). Note that examples have been included in the column headers; please retain headers as the first line in your files. You can find your Community ID and the Citation/Reference ID by using the links to our TAIR search pages provided in the template below.

FieldDescriptionValues/ConstraintsExample
Contact Person Name*Last name,First nameNo space between first and last names or between semicolonBuschmann,Henrik
Contact Person Community ID*unique identifier for a community member in TAIR. You must be registered to have a community IDTAIR accession from TAIR Community Searchcommunity:1035
Contact Person E-mail Addresscurrent email address
buschmann@gsf.de
Marker/Polymorphism Name

PT1 or Salk_0228
Alternative Names*

AtPT1
ChromosomeThe chromosome in which the marker/polymorphism is found1,2,3,4,5 or unknown5
Genetic Marker Type
CAPS, SSLP, AFLP, RFLP, RAPD, SNPlex or TaqmanCAPS
Polymorphism Type Dependent on Marker TypeDigest_pattern (CAPS or RFLP), PCR_product_length (SSLP, AFLP, or RAPD), substitution, insertion, deletionDigest_pattern
Polymorphism Lengths *Length of digest or PCR productnumeric, in kb or bp0.12, 120
Polymorphism Length Units
bp or kbkb
Flank Sequence TypeRequired for all but RFLP submissionsPCR_primer or Flankflank
Flank Sequence 1Required for all but RFLP submissions. The first PCR primer sequence
ATGGTGCCGTGACGT
Flank Sequence 2Required for all but RFLP submissions. The second PCR primer sequence
AATTGGGTGTGCTAG
Special ConditionsIndicate any special conditions required for marker detection
annealing temp 62C
Restriction Enzyme Name*Name of the restriction enzyme used to detect the polymorphism (required for CAPS, AFLP, or RFLP markers).consult REBASE for standard enzyme name.HindIII;EcoRV
Restriction Enzyme Number of sites*Number of recognition sites for each restriction enzyme. The order of number of sites should match the order of restriction enzyme names in previous column.numeric3
Ecotype/Accession Name*The ecotype which shows the specific polymorphism. If more than one ecotype shares a polymorphism, list all ecotypes.Col-0 should always be shown as referenceCol-O;RLD;Ler
Accession Stock IDFor accessions that correspond to an ABRC stock, this is the stock ID number for that accession.
CS3455;CS3444
Polymorphic SequenceSequence of the polymorphic region for the given ecotype/accession
ATGTGGCCTCTT
Map Position1Position of the SNP, insertion site, start position of 5' PCR primer/flanking sequence
20215474
Map Position2End position of 3' PCR primer/flanking sequence
26877049
TAIR Versionversion of the TAIR genome assembly that your experiment is based on
TAIR9
InheritanceThe mode of inheritance of the polymorphismrecessive, dominant, co-dominant, semi-dominant, unknowndominant
Marker/Polymorphism Citation/Reference_idFor a reference that describe the marker/polymorphism, this is the unique identifier in TAIR. For articles referring to the markers that are not yet in TAIR, please contact a curator to update this information into the database when the paper is published.TAIR accession from TAIR Publication Searchpublication:12344

Phenotypes

We accept phenotype data for all mapped and/or sequenced Arabidopsis mutants, including existing ABRC stocks, stocks from other stock centers, and lines that have not been deposited in a public repository. If you are submitting phenotype data for stocks not in ABRC we encourage you to consider also submitting the seed stock.

Please submit your phenotype data in one of the following ways:

Option 1) Use a preformatted Excel spreadsheet. To submit your data, please download and complete the following Excel spreadsheet. Examples are provided for each column.

Download: phenotype_data_form.xls

Option 2) Send tab-delimited data from any program. If you want to create your own file please follow the following format in submitting your data and send the files to: curator@arabidopsis.org. Please include the column headers as the first line in your files.

Description of column headers


FieldDescription
Contact PersonPlease provide name and email of the contact person/corresponding author
Reference (Pubmed ID, DOI, or other format)PMID: 18485063 - we prefer Pubmed ID, but the following format without Pubmed ID is OK too:Plant J (2008), 55: 798
Allele Symbol / Polymorphism NameFor example: rpp3, rpp4(SALK_12345), atswc6(SAIL_1142_C03)
AccessionExample, Ws, Col-0
LocusIf this polymorphism is an allele of a locus, enter the unique locus code here. Otherwise, leave empty. Example: AT4G32520
MutagenEMS, X-rays, T-DNA insertion, transposon, fast neutron etc.
Mutation SitePlease describe the mutation site (3rd intron, 2nd exon, promoter, 3'UTR, 5'UTR etc). Please also give details of the mutation if known (for instance, contains a G to A subsitution at nucleotide position 123)
GenotypeSelect from homozygous, heterozygous, hemizygous, and unknown.
InheritanceSelect from dominant, recessive, incompletely dominant, co-dominant, and unknown.
Allele ModeSelect from hypermorphic, hypomorphic, haplo-insufficient, antimorphic, gain-of-function, loss-of-function, and unknown.
PhenotypePlease provide a brief description of the mutant phenotype. Note: please specify if the phenotype is for double/triple mutants (for example, mop3/mop4).

Protocols

TAIR encourages the research community to share their protocols. To submit a protocol to TAIR, send the protocol in one of the following formats.

Portable Document Format (PDF)
Microsoft Word Document (.doc)
Image (e.g. for powerpoint/photoshop images)- .gif and .jpg
Any text file .txt and .rtf

Along with the above file, please send in additional data following the format. Fields indicated in blue are required. Fields marked with an asterisk(*) can have more than one entry but the entries must be separated by a semicolon.

To find your Community ID and the Citation/Reference ID please follow the provided links to our TAIR search pages.


FieldDescriptionValues/ConstraintsExample
Author(s)*full name1000 character limitJerome Giraudat
Submitter's Community ID*unique identifier for a community member in TAIR. You must be registered to have a community IDTAIR accession from TAIR Community Searchcommunity:4851
TitleTitle of the protocol500 character limitRapid RNA preparation
Descriptionbrief text description1000 character limitA method for rapidly preparing RNA from leaf tissue.
Publication/ Citation/ Reference_idFor a reference that describe the protocol, this is the unique identifier in TAIR. For articles referring to the markers that are not yet in TAIR, please contact a curator to update this information into the database when the paper is published.TAIR accession from TAIR Publication Searchpublication:501682410
Usage keywords*One or more keywords that describe the application(s) of the method (e.g. gene expression assay, protein-protein interaction assay).separate multiple keywords with semicolonsgene expression assay
Method keywords*use as many keywords as necessary to indicate the methods described in the protocol (e.g. mRNA isolation, cell fractionation).separate multiple keywords with semicolonsNorthern gel blot; RNA extraction