...
Subfamilies within each family are groups of genes that share a particularly high degree of protein sequence similarity due to limited divergence from their common ancestor [https://www.ncbi.nlm.nih.gov/pubmed/26578592]. Subfamilies are, in general, closely-related orthologs. In the PANTHER tree building process, a new subfamily is created within a family after every gene duplication event, or horizontal transfer event. After horizontal transfer, the transferred copy becomes the founder of a new subfamily; the vertically inherited copy remains in the original subfamily. After gene duplication, the copy that changes faster in sequence immediately following the duplication becomes the founder of a new subfamily; the slower-evolving copy remains in the same subfamily. There are two exceptions to this rule: (1) because of the high frequency of gene duplication prior to the vertebrate common ancestor, each vertebrate copy following a gene duplication event founds a new subfamily, and (2) duplicated genes do not found a subfamily if they did not lead to orthologs in at least two extant species.
How are gene families named? Why are some families not named?
Biologically meaningful family names are assigned by biologist curators (https://www.ncbi.nlm.nih.gov/pubmed/12952881). The curator either assigns a family a more general functional name that applies to all genes in the family (e.g., NUCLEAR HORMONE RECEPTOR) or finds the largest subfamily name (Y) and names the family Y-RELATED. If in the latter case the largest subfamily is “unnamed” then the family is not named.
How are subfamilies named. Why are some subfamilies not named?
The name of a subfamily is transferred from the representative member of the subfamily (https://www.ncbi.nlm.nih.gov/pubmed/26578592). If a subfamily contains an annotated SwissProt entry from any of the 12 model organisms (human, mouse, rat, chicken, zebrafish, fruit fly, C. elegans, budding yeast, fission yeast, D. discoideum, Arabidopsis, and E. coli.), then the curated ‘protein name’ is used to name the subfamily. If a subfamily does not contain any SwissProt entries from the model organisms, but contains SwissProt entries from other organisms, the most common SwissProt protein name is used as the subfamily name. If a subfamily does not contain any members from the SwissProt database, a protein name from the TrEMBL entry is automatically selected as the subfamily name. If no name can be found, the subfamily is labeled with ‘unnamed’.
What is horizontal gene transfer? How is it detected in PANTHER gene trees?
Horizontal transfer is the movement of genetic material between organisms other than by the default "vertical" transmission from parent to offspring. Horizontal transfer is common in bacteria. In Eukaryotes, events such as the engulfment of the mitochondrion by the proto-eukaryotic cell will be represented at the level of gene family trees as horizontal transfer from a proteobacterial ancestor to the eukaryotic common ancestor. At each step in PANTHER gene tree building the GIGA algorithm considers the number of gene deletions that would be implied by a history of vertical inheritance, and if that number is too large, a horizontal transfer event is considered to be the more likely interpretation (https://www.ncbi.nlm.nih.gov/pubmed/26578592).
Are polyploid organisms represented in gene families?
Several polyploid organisms are included in PhyloGenes. Just like genes from a single genome in a diploid, genes from N (N>1) genomes of a polypoid are treated as individual genes. For example, the bread wheat is a hexaploid, which has three genomes A, B and D. GeneX is present in all three genomes therefore there are three copies of GeneX in bread wheat, GeneX-A, GeneX-B and GeneX-D. Given high sequence similarities shared among the three genes, GeneX-A, GeneX-B and GeneX-D are very likely to be found in the same gene tree shown as descendants of a gene duplication event.