The answers to some frequently asked questions can be found on this page. Use the broad categories listed below to jump to the section relevant to your question. If you cannot find the answer you are looking for you can send email to the TAIR curators at curator@arabidopsis.org.
Table of Contents |
---|
TAIR access issues
My institution has a subscription to TAIR but I am getting page limit messages
If you are accessing TAIR off site -ensure that you are accessing through your library proxy as you would any other library resource.
If you are accessing TAIR on site- please contact your librarian with the IP address of the computer you are using to ensure that your address is within the registered ranges for your institution.
How to report a problem/bug on the website
...
Where can I find a list of coordinates for all genes (including UTRs, introns and exons) and other transcripts?
Please go to ftp://ftp.arabidopsis.org/home/tair/Genes/TAIR10_genome_release to find the complete list of all gene models from the TAIR10 releaseto the Downloads directory> Genes where you will find subdirectories for all of the genome releases. These lists are available in different formats such as gff and xml gtf formats and some contain specific subsets of genes such as all genes whose structures changed between releases TAIR9 and TAIR10 assembly, or all new gene models. Coordinate information of other sequences such as ESTs and cDNAs as well as coordinates of markers and polymorphisms, can be found at ftp://ftp.arabidopsis.org/home/tair/Maps/seqviewer_data
What is the difference between genome assembly and genome annotation versions?
Genome assembly refers to the ordered nucleotide sequence of each chromosome of the reference genome. The current version is called TAIR10. Genome annotation refers to the gene calls /coordinates on the reference genome. The current version is called Araport11.
Where do the gene structural annotations in TAIR come from?
The Arabidopsis genome was initially annotated by the Arabidopsis Genome Initiative (AGI) and later reannotated by TIGR in collaboration with MIPS and TAIR. TAIR assumed primary responsibility for maintaining the Arabidopsis genome annotation in North America following TIGR's final release (TIGR5), producing 5 additional genome releases, TAIR6 through TAIR10. As of 2014, genome annotation is being was handled by Araport starting with the release of Araport11(Araport11, released 2016).
Where can I find a list of GenBank accessions that correspond to AGI locus identifiers?
A file containing the mapping can be downloaded from the TAIR ftp site TAIR10_genome_release.
Why do AGI loci in TAIR differ from other sources
This is probably due to differences in annotation methods used by TIGR and MIPS. You can see annotations in TIGR and MIPS by clicking on their respective links in the "External Links" band on TAIR locus/gene detail pages.
Related pages:
MIPS ; MATDB
TIGR ; Arabidopsis Database
I have identified a new gene that does not have an AGI locus identifier- how can I get an ID before I publish?
Please do not self assign AGI locus IDs, contact us first.
Why do the coordinates for a gene/BAC/5'UTR etc... differ in TAIR from what I expect?
...
- The annotation may have changed between versions of the genome release. Currently we are using the TAIR10 Araport11 release for the SeqViewer and many BLAST datasets.
- BAC lengths in GenBank differ from AGI BACS. Many of the BAC sequences used by the AGI were trimmed to remove overlaps whereas clones with the same name in GenBank are untrimmed and longer. Other AGI BACS were extended by TIGR, adding sequences from adjacent BACs for easier annotation of genes near BAC ends. On SeqViewer, the AGI BACs are called Assembly Units.
- Structural annotations may differ between GenBank and AGI sequences. TAIR builds gene models based on a combination of cDNA and EST sequences from GenBank. A gene's UTR may therefore be extended based on EST alignments even though cDNA is in relative terms truncated.
...
- Some gene models may not have been predicted by our annotation process because of limitations of the gene prediction software. If you are using BLAST or another sequence similarity search tool on TAIR, choose AGI whole genome (BAC clones), or GenBank whole genome (BAC clones) datasets. Unannotated genes will not be found in AGI transcript, protein, cds or gene data sets.
- "Missing" genes may reside in one of the few remaining known gaps in the sequence including highly repetitive regions that are difficult to sequence. See:Genome Update pages for information about gaps, incomplete clone sequences and genome monitoring status.
- If you have sequenced a gene that is not in the database, please contact TAIR to update gene structural data, update functional data and provide functional and structural informationand functional data based on experimental evidence.
- Some genes which existed in earlier annotation releases may have been obsoleted, or merged or split into 2 new gene models. To find out when a gene was added or removed, or whether a gene has been split or merged, try the TAIR locus history search
How can I obtain a list of functional categories for a set of genes?
TAIR curators annotate Arabidopsis genes using Gene Ontology terms to describe a genes molecular function, subcellular localization and biological process. GO annotations in TAIR also include contributions from the TAIR community, UniProt and the GO consortium.To obtain all of the GO annotations for a set of genes:
- Go to ToolsAdvanced Search-> Bulk Data Retrieval->GO Annotations.Gene Search
- Paste in or upload a list of locus identifiers for your genes (e.g. AT1G23030).
- Choose html or text for greater than 1000 genes.
- If you save the html file as text from your browser or save as text, you can open the file using spreadsheet software such as Microsoft Excel.From the result page choose Get GO annotations
- Download all results
Software/Analysis Tools
What are the specialized datasets available at TAIR for similarity searching (e.g. BLAST, PatMatch)?
TAIR's datasets include: AGI transcript, peptide, cds and gene sequences; introns only; insertion flank sequences; locus upstream and downstream sequences and more. For a full listing of the available data sets see: Datasets. These data sets are common to BLAST and PatMatch. You can download TAIR BLAST datasets from the Downloads> Sequences directory.
How do I interpret the error message I get when I submit a sequence?
...
Converting genetic locations to sequence locations will only give an approximate correlation. This is because the conversion depends upon both the genetic map used and the frequency of recombination which is variable within the genome. The most accurate estimate would obtained by comparing the Lister and Dean RI map to the genome sequence (AGI map) for a global genome comparison value. You can also visually align genetic and sequence based maps using the MapViewer tool, by aligning common markers from the genetic and sequence maps.
How up to date is the Lister/Dean RI map data in the MapViewer?
The RI map was last updated from NASC mapping data from May, 2001
Gene Expression
Please note that TAIR stopped accepting new microarray data submissions in June 2005. Newer and more comprehensive microarray data sets are available at GEO, ArrayExpress and NASCArrays.
...
You can now search for microarray experiments directly from the Microarray Elements Search. In addition, the Expression Viewer display now has direct links to the microarray experiment details. Click on the name of the hybridization in the Expression Viewer to display the information about the experiement.
How do I open microarray data files I have downloaded from
...
TAIR
...
?
The raw data files for microarray experiments are large and therefore have been compressed. To uncompress the microarray datafiles (with .gz suffix) do the following: For MAC/UNIX/LINUX from the command line in a terminal window type in gzip -d /home/yourname/yourpath/filename.gz . For example: gzip -d /home/frank/franksfiles/ciw_2000.gz. For PC's you should download the WinZip utility to decompress the files.
...
Where can I find segregation data for recombinant inbred lines?
- For the Lister and Dean ColXLer Map go to NASC RI map data and click on "Marker scores for latest map" in the sidebar.
- For the Koorneef LerXCvi lines go to TAIR's FTP site :FTP->Maps -> Ler_Cvi_RIdata
In the Downloads section under the Maps directory.
Why do the BAC locations given for SNPs and INDELS from the CEREON database differ from their location when I try to locate them on the BAC?
...
Each of the search results pages includes the option to obtain a listing of specific records or a set of records that you can download and open in a spreadsheet. Information about the downloaded fields can be found in the help pages. If there is a specific set of data that you would like , contact the curators and we will do our best to accommodate your request. User requests are placed in the FTP site under the User Requests directory.
Posting Jobs
How do I post a job opening at TAIR?
...
How do I submit data to TAIR?
See Data Submission section for instructions on how to submit Marker/Polymorphism, Gene Family, Functional Genomics Gene Lists and other data to TAIRinformation on what data types TAIR accepts directly.
Gene nomenclature
Before I name my gene, how can I find out if a gene name is in use?
...