Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.




Using the Protein Search

TAIR's Protein search allow you to search for proteins with a variety of parameters. You can perform a simple search by name, restrict your search to proteins having specific physio-chemial properties and domains, as well as limiting your search to proteins encoded in specified regions of the genome.

Search by Name

You can search for proteins by the following names:

  • Locus name: For sequenced genes, the locus name corresponds to the orf name determined by AGI orf naming convention. For genetic loci (e.g., genes identified by mutation but not yet associated with a sequence) the name corresponds to the accepted symbolic name.AGI orf names have the format AT(1-5 or C,M)gXXXXXX

...

  • , where the value in parenthesis here corresponds to the chromosome number or organelle genome.
  • Gene Symbol:

...

  • This is the symbolic name of the locus.
  • GenPeptID: Use this if you know the unique GenBank identifier for the protein.

Search by Structural Class Type

This feature allows you to restrict your search to include only those proteins belonging to the specified structural class

...

. You may select multiple options within each parameter by clicking on one selection and then clicking on additional ones while holding down either the CTRL key (PCs) or the Apple key (Mac).

Structural class assignment was performed from annotations of SCOP's superfamilies using HMM models against TIGR's 4.0 Release by Drs

...

. Julian Gough and Martin Madera at SCOP database. More information can be found in the following papers

...

:

Gough, J., Hughey ,R., Karplus ,K., and Chothia ,C.(2001). Assignment

...

of genome sequences using a library of hidden Markov models that represent all proteins of known structure. J Mol Biol. 2001 Nov 2;313(4):903-19

Gough, J., Chothia, C.(2002).SUPERFAMILY:HMMs representing all proteins of known structure. SCOP sequence searches, alignments, and genome assignments. Nucl. Acids Res., 2002 Jan 1;30(1):268-72

Search by Gene List

Use this option to make bulk queries using AGI locus IDs.

Search by Physio-chemical Properties

You can limit your search to include only proteins having specific physical/chemical properties.


Length

The calculated length of the translated protein in amino acids. This does not include lengths after processing such as cleavage of signal peptides.

Calculated MW

The predicted molecular weight in kiloDaltons

...

.

...

Calculated PI

The predicted isoelectric point.The isoelectric point is the point at which, on an isoelectric focusing gel the pH at which a protein has a net charge of zero. The pI of a protein is determined by its amino acid composition-and the net contribution of positive and negative charges of the side chains

...

.

...

Values are between pH 1-14.

Domains

Use this option if you want to restrict your search to include only proteins having a specified domain composition. The drop down menus allow you to select the number of

...

occurrences of each domain along with the syntax for identifying the domain. For example, to search for proteins that have one or more

...

occurrences of domain

...

IPR045176 do the following:

  • Select the greater than symbol in the first column.
  • Enter one zero '0' in the adjacent input box.
  • Select Prosite INTERPRO from the domain type drop down menu
  • Choose exact match
  • Enter the name of the domain (PS00027IPR045176)

If searching for more than one domain, the search is treated as a logical AND. Therefore, inputing PROSITE entering INTERPRO domain PS00027 IPR045176 AND PFAM ALPHAFOLD domain PF04618 AFA0A1I9LRD1F1 will limit your search to proteins having BOTH domains.

Protein domains are conserved regions of amino acid /structural similarity in protein sequences. Domains generally represent functional units having some form of biological activity. Domains are useful in grouping proteins with little overall sequence similarity. This search allows you to specify both the type and number of domains and use either the domain name , or unique domain identifier. Leaving this option blank will include all proteins in your search. The

...

INTERPRO databases

...

are the source of the data for this information. For more information see: INTERPRO database

...

Restrict by Time

Use this option to limit your search to protein records that are new or have been updated within the specified time period.

Restrict by Location

This option allows you to restrict your search to proteins encoded by loci on a particular chromosome and within a specified range. You can search any one of the five nuclear chromosomes or any of the organellar genomes (mitochondrial and chloroplast).

MapType

The available map for searching proteins is the AGI sequence map.

Range

Lets you specify a range search by the upper and lower bounds (when you select "Between") or a center point (when you select "Around"). The value is physical distance (kb). When you select "Between" from the drop-down menu, your search will be within the range defined by two entities or positions on a particular map. When you select "Around" from the drop-down menu, your search will be the area +/-10 cM and/or +/- 100 kb from the specified entity or position. When you choose search around, the second value input and units options are disabled.

Output Options

Number of records

You can select to display 25, 50, 100 or 200 result items on a single page. More results per page will take longer to load.

Sort by

You can choose to sort the results by protein name, position of the locus, or locus name.