PhylomeDB
NAR Molecular Biology Database Collection entry number 1113
Gabaldon, Toni; Huerta-Cepas, Jaime; Capella-Gutierrez, Salvador; Pryszcz, Leszek; Marcet-Houben, Marina
Contact toni.gabaldon@crg.eu
Database Description
Phylomes are complete collections of phylogenetic trees for each gene encoded in a given genome. The advent of the sequencing era has resulted in an exponential increase in the number of fully sequenced genomes. This, coupled with the improvements in tree reconstruction algorithms, have made phylome construction a tool used more and more often for the analysis of genomes. Reconstructing a phylome implies the generation of large amounts of data in the form of multiple sequence alignments and phylogenetic trees. PhylomeDB was first created to serve as storage to all this data, where users could not only download whole phylomes, but also browse through trees and alignments of their choice that had been reconstructed in the phylomes. It currently contains more than 120 public phylomes spread through the three domains of life. The amount of trees and alignments found in the database has already exceeded 1.5 million.
In recent years phylomeDB has also become a referent for orthology and paralogy predictions inferred from the phylogenetic trees it stores. For each tree found in the database, phylomeDB maps speciation and duplication events to the tree from which orthology and paralogy relationships between sequences can be inferred. This is complemented by a direct link to metaPhOrs (1), which creates consensus lists of orthologs and paralogs obtained from some of the major tree databases. Also, lists of complete predictions can be downloaded for a complete phylome. One of the main features in phylomeDB is the enriched visualization of stored trees. Interactive images, created using ETE (2), allow the user to visualize additional information about the sequences being displayed and to perform several modifications such as changing the root of the tree or the order of the leaves. It also contains links that can provide the user with more information about the sequences used to reconstruct the phylome and also links to other major protein databases.
Recent Developments
In the last phylomeDB version the amount of phylomes provided to the public has increased to 120. This includes a large group of newly created prokaryotic phylomes, which was one of the areas where phylomeDB was lacking information. In addition, phylomeDB is part of the Quest for Orthologs initiative (3,4) which tries to establish standards for orthology and paralogy predictions. Given the increasing amount of phylomes registered in PhylomeDB we have introduced the phylome collections, which are groups of phylomes that were generated for a common project or that contain phylomes reconstructed for a particular group of species.
Tree images have also been enhanced and more information has been added to them. The main new feature introduced in the trees is the graphs that represent the mappings of protein domains onto each leaf of the tree. For each leaf in the tree a small graph represents the sequence by mapping protein domains onto it. Regions not included in protein domains are represented by colored lines according to their amino acid composition, empty parts of the graph represent gaps in the alignment.
Acknowledgements
The authors wish to thank members of Gabaldon's group, the quest for orthologs consortium and PhylomeDB users for their suggestions and feedback. They also thank Cristina Amil for her collaboration and CRG scientific IT for their support.
References
1. Pryszcz LP, Huerta-Cepas J, Gabaldon T. MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score. 2010;39:e32
2. Huerta-Cepas J, Dopazo J, Gabaldon T. ETE: a python environment for tree exploration. BMC Bioinformatics 2010;11:24.
3. Gabaldon T, Dessimoz C, Huxley-Jones J, Vilella AJ, Sonnhammer EL, Lewis S. Joining forces in the quest for orthologs. Genome Biol. 2009;10:403.
4. Dessimoz C, Gabaldon T, Roos DS, Sonnhammer EL, Herrero J. Toward community standards in the quest for orthologs. Bioinformatics 2012;28:900-904.
2. Huerta-Cepas J, Dopazo J, Gabaldon T. ETE: a python environment for tree exploration. BMC Bioinformatics 2010;11:24.
3. Gabaldon T, Dessimoz C, Huxley-Jones J, Vilella AJ, Sonnhammer EL, Lewis S. Joining forces in the quest for orthologs. Genome Biol. 2009;10:403.
4. Dessimoz C, Gabaldon T, Roos DS, Sonnhammer EL, Herrero J. Toward community standards in the quest for orthologs. Bioinformatics 2012;28:900-904.
Category: Genomics Databases (non-vertebrate)
Subcategory: General genomics databases
Category: Human and other Vertebrate Genomes
Subcategory: Model organisms, comparative genomics
Go to the article in the NAR Database issue.
Oxford University Press is not responsible for the content of external internet sites