- Split View
-
Views
-
Cite
Cite
Ron Caspi, Tomer Altman, Kate Dreher, Carol A. Fulcher, Pallavi Subhraveti, Ingrid M. Keseler, Anamika Kothari, Markus Krummenacker, Mario Latendresse, Lukas A. Mueller, Quang Ong, Suzanne Paley, Anuradha Pujar, Alexander G. Shearer, Michael Travers, Deepika Weerasinghe, Peifen Zhang, Peter D. Karp, The MetaCyc database of metabolic pathways and enzymes and the BioCyc collection of pathway/genome databases, Nucleic Acids Research, Volume 40, Issue D1, 1 January 2012, Pages D742–D753, https://doi.org/10.1093/nar/gkr1014
- Share Icon Share
Abstract
The MetaCyc database (http://metacyc.org/) provides a comprehensive and freely accessible resource for metabolic pathways and enzymes from all domains of life. The pathways in MetaCyc are experimentally determined, small-molecule metabolic pathways and are curated from the primary scientific literature. MetaCyc contains more than 1800 pathways derived from more than 30 000 publications, and is the largest curated collection of metabolic pathways currently available. Most reactions in MetaCyc pathways are linked to one or more well-characterized enzymes, and both pathways and enzymes are annotated with reviews, evidence codes and literature citations. BioCyc (http://biocyc.org/) is a collection of more than 1700 organism-specific Pathway/Genome Databases (PGDBs). Each BioCyc PGDB contains the full genome and predicted metabolic network of one organism. The network, which is predicted by the Pathway Tools software using MetaCyc as a reference database, consists of metabolites, enzymes, reactions and metabolic pathways. BioCyc PGDBs contain additional features, including predicted operons, transport systems and pathway-hole fillers. The BioCyc website and Pathway Tools software offer many tools for querying and analysis of PGDBs, including Omics Viewers and comparative analysis. New developments include a zoomable web interface for diagrams; flux-balance analysis model generation from PGDBs; web services; and a new tool called Web Groups.
INTRODUCTION
MetaCyc (http://metacyc.org/) is a highly curated, non-redundant reference database of small-molecule metabolism. It contains metabolic pathway and enzyme data experimentally demonstrated in the scientific literature (1). Because MetaCyc contains only experimentally determined pathways and enzymes, and due to its tight integration of data and references, MetaCyc is a uniquely valuable resource in fields including genome analysis, metabolism, and metabolic engineering. The metabolic pathways and enzymes in MetaCyc are derived from organisms representing all domains of life.
In conjunction with its role as a general reference on metabolism, MetaCyc is used as a reference database for the PathoLogic component of the Pathway Tools software (2) to computationally predict the metabolic network of any organism having a sequenced and annotated genome (3). In this automated process, a predicted metabolic network is created in the form of a Pathway/Genome Database (PGDB). In addition to the automated creation of PGDBs, the editing capabilities of Pathway Tools enable scientists to improve and update these computationally generated PGDBs by manual curation. MetaCyc has been used by SRI to create more than 1700 PGDBs (as of October 2011), which are available through the BioCyc (http://biocyc.org/) website. Interested scientists may adopt and curate any of these PGDBs through the BioCyc website (http://biocyc.org/intro.shtml#adoption).
In addition, MetaCyc is used by other scientists to create additional PGDBs, many of which are available to the general public through the scientists’ own websites. Together with BioCyc, these PGDBs form the MetaCyc family of databases (4).
More than 100 groups have used Pathway Tools and MetaCyc to create PGDBs for their organisms of interest, including important model organisms such as Saccharomyces cerevisiae (5), Arabidopsis thaliana (6), Oryza sativa (7), Mus musculus (8), Bos taurus (9), Medicago truncatula (10), Populus trichocarpa (11), Dictyostelium discoideum (12), Leishmania major (13), Chlamydomonas reinhardtii (14), several Solanaceae species (15), bioenergy-related organisms (BeoCyc) and many pathogenic bacteria (16) (see http://biocyc.org/otherpgdbs.shtml for a more complete list). A few examples of organisms that were studied within the last year using Pathway Tools include Bacillus acidocaldarius, B. circulans, B. filicolonicus, B. laterosporus, B. licheniformis and B. stearothermophilus (17), Clostridium difficile (18), C. thermocellum (19), Corynebacterium pseudotuberculosis (20), Ignicoccus hospitalis and Nanoarchaeum equitans (21), Mycobacterium species (22,23), Rhizobium etli CFN42 (24), Rhodococcus erythropolis (25), R. opacus PD630 (26), S. cerevisiae and Pichia pastoris (27), Serratia symbiotica (28), Shewanella species (29), Vibrio vulnificus (30), Xanthomonas axonopodis (31) and X. citri (32). Pathway Tools can also generate PGDBs from metagenomic data sets (33).
A web server included in Pathway Tools enables the publishing of PGDBs through either the Internet or an internal network. The Navigator component of Pathway Tools allows the browsing and analysis of PGDBs either locally or over the Internet. A detailed description of Pathway Tools can be found in (34).
PGDBs generated by Pathway Tools and MetaCyc are an excellent platform for the integration of genome information with many other types of data regarding metabolism, regulation and genetics. They provide powerful tools for analyzing omics data sets from experiments related to gene transcription, metabolomics, proteomics, ChIP-chip analysis and so on. During the past 2 years, we again significantly expanded the data content of MetaCyc and BioCyc. We also added supporting enhancements to the Pathway Tools software and BioCyc website, as described in the following sections.
METACYC ENHANCEMENTS
Expansion of MetaCyc
All pathways in MetaCyc are curated from the experimental literature. Since the last Nucleic Acids Research publication (2 years ago) (1), we added 413 new base pathways (pathways comprised of reactions only, where no portion of the pathway is designated as a subpathway) and 40 superpathways (pathways composed of at least one base pathway plus additional reactions or pathways), and updated 107 existing pathways, for a total of 560 new and revised pathways. The total number of base pathways grew by 28%, from 1399 (version 13.5) to 1790 (version 15.5) (the total increase is less than 413 pathways because some existing pathways were deleted from the database during this period) while the total number of superpathways grew by 17%, from 235 (version 13.5) to 275 (version 15.5).
Along with the increase in pathway number, the number of enzymes, reactions, chemical compounds and citations in the database grew by 30, 19, 13 and 49%, respectively; the number of referenced organisms increased by 23% (currently at 2216).
New pathway classes defined in MetaCyc
The pathways in MetaCyc are classified by an ontology developed at SRI that is constantly updated to reflect curation needs. Recently, we added two new top-level classes to that ontology: Activation/Inactivation/Interconversion and Metabolic Clusters.
The Activation/Inactivation/Interconversion class was added to describe certain pathways that did not fit well into any other classes, and, as its name implies, includes the three subclasses: Activation, Inactivation and Interconversion. In contrast to a standard ‘biosynthesis’ pathway in which a biologically active compound is synthesized from precursor molecules, activation pathways involve relatively minor chemical modifications to existing compounds that result in a substantial increase in their biological activity. An example activation pathway is sulfate activation for sulfonation.
Similarly, inactivation pathways involve relatively minor chemical modifications to existing biologically active compounds that result in a substantial decrease in their biological activity. This is in contrast to standard ‘degradation’ pathways in which a more complex compound is broken down into a set of simple metabolites. An example inactivation pathway is gibberellin inactivation II (methylation).
Interconversion pathways describe the bidirectional conversion of a bio-molecule to a different form, where the forward and backward conversions often prompt significant changes in the biological activity of the compound, resulting in its activation and deactivation, respectively. For an example, see medicarpin conjugates interconversion.
The Metabolic Clusters class was added to classify metabolic diagrams that do not describe the classical notion of a pathway. In pathways, all reactions are connected to one another, whereas metabolic clusters comprise a collection of non-connected but related reactions that together describe a common phenomenon. For example, see tRNA methylation (yeast), which describes a collection of tRNA methyltransferase-catalyzed reactions in yeast.
Ontology distribution of MetaCyc pathways
The six top-level categories (or classes) of the MetaCyc pathway ontology are Biosynthesis, Degradation/Utilization/Assimilation, Generation of Precursor Metabolites and Energy, Detoxification, Activation/Inactivation/Interconversion and Metabolic Clusters.
In version 15.5, the largest top-level class is Biosynthesis, with 1143 base pathways. Its main subclasses are Secondary Metabolites Biosynthesis (447); Cofactors, Prosthetic Groups, and Electron Carriers Biosynthesis (186); Amino Acids Biosynthesis (110); and Fatty Acids and Lipids Biosynthesis (124).
The second-largest top-level class is Degradation/Utilization/Assimilation, with 793 base pathways. Within this group, the largest subclasses are Aromatic Compounds Degradation (167), Amino Acids Degradation (117), Inorganic Nutrients Metabolism (94), Secondary Metabolites Degradation (85) and Carbohydrates Degradation (84).
The third-largest top-level class, Generation of Precursor Metabolites and Energy, contains 158 base pathways. Its largest subclasses are Fermentation (46), Respiration (28), Chemoautotrophic Energy Metabolism (15), Methanogenesis (13) and Electron Transfer (13).
The other three top-level classes are much smaller. The Detoxification class doubled in size and now contains 32 base pathways, and the new Activation/Inactivation/Interconversion and Metabolic Clusters classes contain 22 and 19 pathways, respectively.
During the previous 2 years, the number of metazoan pathways in MetaCyc increased by 42%, from 174 to 247 pathways. Plant pathways increased by 22% to 784, and archaeal pathways increased by 17% to 126. The number of pathways classified as bacterial actually decreased by 12%, as a result of a more accurate taxonomic classification of pathways.
Table 1 lists the species with the largest number of experimentally elucidated pathways in MetaCyc (meaning that there is experimental evidence for the occurrence of these pathways in the organism), while Table 2 describes the distribution of pathways in MetaCyc based on the taxonomic classification of associated species. The list of pathways added to MetaCyc since the last NAR publication is too long to specify here. For a complete report, see the MetaCyc Release Notes history at http://metacyc.org/release-notes.shtml.
Bacteria . | Eukarya . | Archaea . | |||
---|---|---|---|---|---|
Escherichia coli | 276 | Arabidopsis thaliana | 311 | Methanosarcina barkeri | 20 |
Pseudomonas aeruginosa | 66 | Homo sapiens | 186 | Methanocaldococcus jannaschii | 21 |
Bacillus subtilis | 57 | Saccharomyces cerevisiae | 134 | Methanosarcina thermophila | 18 |
Pseudomonas putida | 49 | Rattus norvegicus | 77 | Sulfolobus solfataricus | 18 |
Salmonella typhimurium | 36 | Glycine max | 67 | ||
Pseudomonas fluorescens | 30 | Mus musculus | 50 | ||
Mycobacterium tuberculosis | 26 | Solanum lycopersicum | 48 | ||
Agrobacterium tumefaciens | 25 | Pisum sativum | 47 | ||
Enterobacter aerogenes | 25 | Zea mays | 46 | ||
Klebsiella pneumoniae | 21 | Solanum tuberosum | 41 | ||
Mycobacterium smegmatis | 18 | Nicotiana tabacum | 39 | ||
Delftia acidovorans | 18 | Oryza sativa | 35 | ||
Hordeum vulgare | 31 | ||||
Spinacia oleraca | 25 | ||||
Triticum aestivum | 23 | ||||
Bos taurus | 22 | ||||
Sus scrofa | 18 | ||||
Petunia × hybrida | 18 |
Bacteria . | Eukarya . | Archaea . | |||
---|---|---|---|---|---|
Escherichia coli | 276 | Arabidopsis thaliana | 311 | Methanosarcina barkeri | 20 |
Pseudomonas aeruginosa | 66 | Homo sapiens | 186 | Methanocaldococcus jannaschii | 21 |
Bacillus subtilis | 57 | Saccharomyces cerevisiae | 134 | Methanosarcina thermophila | 18 |
Pseudomonas putida | 49 | Rattus norvegicus | 77 | Sulfolobus solfataricus | 18 |
Salmonella typhimurium | 36 | Glycine max | 67 | ||
Pseudomonas fluorescens | 30 | Mus musculus | 50 | ||
Mycobacterium tuberculosis | 26 | Solanum lycopersicum | 48 | ||
Agrobacterium tumefaciens | 25 | Pisum sativum | 47 | ||
Enterobacter aerogenes | 25 | Zea mays | 46 | ||
Klebsiella pneumoniae | 21 | Solanum tuberosum | 41 | ||
Mycobacterium smegmatis | 18 | Nicotiana tabacum | 39 | ||
Delftia acidovorans | 18 | Oryza sativa | 35 | ||
Hordeum vulgare | 31 | ||||
Spinacia oleraca | 25 | ||||
Triticum aestivum | 23 | ||||
Bos taurus | 22 | ||||
Sus scrofa | 18 | ||||
Petunia × hybrida | 18 |
The species are grouped by taxonomic domain and are ordered within each domain based on the number of pathways (number following species name) to which the given species was assigned. Some pathways may be labeled with a higher-level taxon, such as genus, if all the species within that genus are thought to have the given pathway. However, such higher-level taxa are not included in this table.
Bacteria . | Eukarya . | Archaea . | |||
---|---|---|---|---|---|
Escherichia coli | 276 | Arabidopsis thaliana | 311 | Methanosarcina barkeri | 20 |
Pseudomonas aeruginosa | 66 | Homo sapiens | 186 | Methanocaldococcus jannaschii | 21 |
Bacillus subtilis | 57 | Saccharomyces cerevisiae | 134 | Methanosarcina thermophila | 18 |
Pseudomonas putida | 49 | Rattus norvegicus | 77 | Sulfolobus solfataricus | 18 |
Salmonella typhimurium | 36 | Glycine max | 67 | ||
Pseudomonas fluorescens | 30 | Mus musculus | 50 | ||
Mycobacterium tuberculosis | 26 | Solanum lycopersicum | 48 | ||
Agrobacterium tumefaciens | 25 | Pisum sativum | 47 | ||
Enterobacter aerogenes | 25 | Zea mays | 46 | ||
Klebsiella pneumoniae | 21 | Solanum tuberosum | 41 | ||
Mycobacterium smegmatis | 18 | Nicotiana tabacum | 39 | ||
Delftia acidovorans | 18 | Oryza sativa | 35 | ||
Hordeum vulgare | 31 | ||||
Spinacia oleraca | 25 | ||||
Triticum aestivum | 23 | ||||
Bos taurus | 22 | ||||
Sus scrofa | 18 | ||||
Petunia × hybrida | 18 |
Bacteria . | Eukarya . | Archaea . | |||
---|---|---|---|---|---|
Escherichia coli | 276 | Arabidopsis thaliana | 311 | Methanosarcina barkeri | 20 |
Pseudomonas aeruginosa | 66 | Homo sapiens | 186 | Methanocaldococcus jannaschii | 21 |
Bacillus subtilis | 57 | Saccharomyces cerevisiae | 134 | Methanosarcina thermophila | 18 |
Pseudomonas putida | 49 | Rattus norvegicus | 77 | Sulfolobus solfataricus | 18 |
Salmonella typhimurium | 36 | Glycine max | 67 | ||
Pseudomonas fluorescens | 30 | Mus musculus | 50 | ||
Mycobacterium tuberculosis | 26 | Solanum lycopersicum | 48 | ||
Agrobacterium tumefaciens | 25 | Pisum sativum | 47 | ||
Enterobacter aerogenes | 25 | Zea mays | 46 | ||
Klebsiella pneumoniae | 21 | Solanum tuberosum | 41 | ||
Mycobacterium smegmatis | 18 | Nicotiana tabacum | 39 | ||
Delftia acidovorans | 18 | Oryza sativa | 35 | ||
Hordeum vulgare | 31 | ||||
Spinacia oleraca | 25 | ||||
Triticum aestivum | 23 | ||||
Bos taurus | 22 | ||||
Sus scrofa | 18 | ||||
Petunia × hybrida | 18 |
The species are grouped by taxonomic domain and are ordered within each domain based on the number of pathways (number following species name) to which the given species was assigned. Some pathways may be labeled with a higher-level taxon, such as genus, if all the species within that genus are thought to have the given pathway. However, such higher-level taxa are not included in this table.
Bacteria . | Eukarya . | Archaea . | |||
---|---|---|---|---|---|
Proteobacteria | 900 | Viridiplantae | 784 | Euryarchaeota | 125 |
Firmicutes | 258 | Fungi | 271 | Crenarchaeota | 37 |
Actinobacteria | 214 | Metazoa | 247 | ||
Bacteroidetes/Chlorobi | 59 | Euglenozoa | 24 | ||
Cyanobacteria | 48 | Alveolata | 15 | ||
Deinococcus-Thermus | 25 | Amoebozoa | 10 | ||
Tenericutes | 19 | Stramenopiles | 5 | ||
Thermotogae | 19 | Fornicata | 4 | ||
Aquificae | 13 | Rhodophyta | 4 | ||
Spirochaetes | 12 | Haptophyceae | 3 | ||
Chlamydiae- Verrucomicrobia | 6 | Parabasalia | 3 | ||
Planctomycetes | 6 | ||||
Chloroflexi | 4 | ||||
Fusobacteria | 4 | ||||
Nitrospirae | 2 | ||||
Thermodesulfobacteria | 2 | ||||
Chrysiogenetes | 1 |
Bacteria . | Eukarya . | Archaea . | |||
---|---|---|---|---|---|
Proteobacteria | 900 | Viridiplantae | 784 | Euryarchaeota | 125 |
Firmicutes | 258 | Fungi | 271 | Crenarchaeota | 37 |
Actinobacteria | 214 | Metazoa | 247 | ||
Bacteroidetes/Chlorobi | 59 | Euglenozoa | 24 | ||
Cyanobacteria | 48 | Alveolata | 15 | ||
Deinococcus-Thermus | 25 | Amoebozoa | 10 | ||
Tenericutes | 19 | Stramenopiles | 5 | ||
Thermotogae | 19 | Fornicata | 4 | ||
Aquificae | 13 | Rhodophyta | 4 | ||
Spirochaetes | 12 | Haptophyceae | 3 | ||
Chlamydiae- Verrucomicrobia | 6 | Parabasalia | 3 | ||
Planctomycetes | 6 | ||||
Chloroflexi | 4 | ||||
Fusobacteria | 4 | ||||
Nitrospirae | 2 | ||||
Thermodesulfobacteria | 2 | ||||
Chrysiogenetes | 1 |
For example, the statement ‘Tenericutes 19’ means that there is experimental evidence for at least 19 MetaCyc pathways for their occurrence in members of this taxonomic group. Major Taxonomic groups are grouped by domain and are ordered within each domain based on the number of pathways (number following taxon name) associated with the taxon. A pathway may be associated with multiple organisms.
Bacteria . | Eukarya . | Archaea . | |||
---|---|---|---|---|---|
Proteobacteria | 900 | Viridiplantae | 784 | Euryarchaeota | 125 |
Firmicutes | 258 | Fungi | 271 | Crenarchaeota | 37 |
Actinobacteria | 214 | Metazoa | 247 | ||
Bacteroidetes/Chlorobi | 59 | Euglenozoa | 24 | ||
Cyanobacteria | 48 | Alveolata | 15 | ||
Deinococcus-Thermus | 25 | Amoebozoa | 10 | ||
Tenericutes | 19 | Stramenopiles | 5 | ||
Thermotogae | 19 | Fornicata | 4 | ||
Aquificae | 13 | Rhodophyta | 4 | ||
Spirochaetes | 12 | Haptophyceae | 3 | ||
Chlamydiae- Verrucomicrobia | 6 | Parabasalia | 3 | ||
Planctomycetes | 6 | ||||
Chloroflexi | 4 | ||||
Fusobacteria | 4 | ||||
Nitrospirae | 2 | ||||
Thermodesulfobacteria | 2 | ||||
Chrysiogenetes | 1 |
Bacteria . | Eukarya . | Archaea . | |||
---|---|---|---|---|---|
Proteobacteria | 900 | Viridiplantae | 784 | Euryarchaeota | 125 |
Firmicutes | 258 | Fungi | 271 | Crenarchaeota | 37 |
Actinobacteria | 214 | Metazoa | 247 | ||
Bacteroidetes/Chlorobi | 59 | Euglenozoa | 24 | ||
Cyanobacteria | 48 | Alveolata | 15 | ||
Deinococcus-Thermus | 25 | Amoebozoa | 10 | ||
Tenericutes | 19 | Stramenopiles | 5 | ||
Thermotogae | 19 | Fornicata | 4 | ||
Aquificae | 13 | Rhodophyta | 4 | ||
Spirochaetes | 12 | Haptophyceae | 3 | ||
Chlamydiae- Verrucomicrobia | 6 | Parabasalia | 3 | ||
Planctomycetes | 6 | ||||
Chloroflexi | 4 | ||||
Fusobacteria | 4 | ||||
Nitrospirae | 2 | ||||
Thermodesulfobacteria | 2 | ||||
Chrysiogenetes | 1 |
For example, the statement ‘Tenericutes 19’ means that there is experimental evidence for at least 19 MetaCyc pathways for their occurrence in members of this taxonomic group. Major Taxonomic groups are grouped by domain and are ordered within each domain based on the number of pathways (number following taxon name) associated with the taxon. A pathway may be associated with multiple organisms.
Curation of bioenergy pathways
Bioenergy is a rapidly growing area of research that focuses primarily on biomass conversion and biofuels production. To address the needs of the bioenergy research community we have made a priority of curating bioenergy-related pathways and enzymes in MetaCyc, starting with version 15.1 (released June 2011). Fields that receive attention are hydrogen production, cellulosic biomass biosynthesis and degradation, and algal oil production. So far we have created seven different hydrogen biosynthesis pathways, provided upgraded structures and commentary to many of the cellulosic biomass components, such as cellulose, hemicelluloses, xylan, arabinan, arabinogalactan, arabinoxylan, glucuronoxylan, glucomannan, galactomannan, galactoglucomannan and rhamnogalacturonan, and curated pathways for the biosynthesis and degradation of several of these polymers by different organisms. For an example, see cellulose degradation I (cellulosome).
Curation of engineered pathways
Since its inception, MetaCyc included only natural pathways that occur in unmodified organisms. However, over the years users indicated to us that it would be useful to include genetically engineered pathways in the database. Version 15.5 of MetaCyc (released October 2011) is the first to include such engineered pathways. To avoid confusion, engineered pathways are clearly indicated by the title ‘MetaCyc Engineered Pathway’ next to the pathway name. A text line above the summary indicates ‘Note: This is an engineered pathway. It does not occur naturally in any known organism, and has been constructed in a living cell by metabolic engineering.’
In addition, the organisms that contributed enzymes to the pathway are listed under the description ‘The enzymes catalyzing the steps of this pathway have been assembled from the following organisms’. Engineered pathways are excluded by our PathoLogic software when predicting the presence of pathways in organism-specific PGDBs.
For an example of an engineered pathway, see pyruvate fermentation to hexanol.
Chimeric and conspecific pathways
Users of MetaCyc are familiar with the concept of superpathways, which are constructed in PGDBs by combining multiple elements (at least one base pathway or superpathway, along with additional pathways or reactions) to show relationships between them and depict a larger portion of the metabolic network within a single diagram. Although most MetaCyc superpathways consist of pathways known to occur in the same organism, we sometimes find it useful to construct superpathways from pathways that are known to occur in different organisms. Combining such pathways into a single superpathway can provide an overview of a metabolic field. For example, combining all the known pathways for aerobic degradation of aromatic compounds into a single diagram provides a useful overview of this topic [see superpathway of aromatic compound degradation (aerobic)].
To distinguish such pathways from those that occur in their entirety in a single organism, we defined the terms ‘conspecific pathways’ and ‘chimeric pathways’.
While a conspecific pathway comprises a set of reactions that are expected to be found within each organism that has the pathway, a chimeric pathway comprises reactions from multiple organisms, and most commonly does not occur in its entirety in a single organism. Only sections of chimeric pathways are likely to occur in their entirety in single organisms. The two types of pathways are treated differently by the PathoLogic program during the creation of new PGDBs. When PathoLogic predicts a conspecific pathway to occur in another organism, the pathway will be transferred to that organism in its entirety. In the near future we will enhance PathoLogic so that when it predicts a chimeric pathway to occur in an organism-specific PGDB, it will remove extraneous reactions from the pathway to produce a conspecific version of the pathway. Conspecific pathways can be either base pathways or superpathways, while chimeric pathways are always superpathways.
To alert the user to the fact that a pathway is chimeric the following note appears above the summary section: ‘This is a chimeric pathway, comprising reactions from multiple organisms, and typically will not occur in its entirety in a single organism. The taxa listed here are likely to catalyze only subsets of the reactions depicted in this pathway.’ In addition, the pathway's title states ‘MetaCyc Chimeric Pathway’.
Kinetic data in PGDBs
We have recently more than doubled the number of types of enzyme kinetic data that can be captured in Pathway Tools PGDBs. When available, the following types of data are now collected in newly curated MetaCyc enzymes: Vmax, Kcat, Specific activity, optimal temperature, optimal pH, Ki values for inhibitors and Km values for substrates.
Interactions with other databases
IUBMB
MetaCyc is regularly updated with data from the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB), which includes new and modified EC entries. The last supplement incorporated is supplement 17, and the data was retrieved from the ExplorEnz database (35). In addition, starting with release 15.0, the EC entries at ExplorEnz are linked to MetaCyc reaction pages and vice versa.
NCBI taxonomy
The full NCBI Taxonomy database (36) is integrated into Pathway Tools, enabling specification of taxa using NCBI Taxonomy, and allowing taxonomic querying of MetaCyc pathways and enzymes. We continue to update the taxonomy entries with each major release of MetaCyc.
Gene ontology
The mapping between MetaCyc reactions and Gene Ontology (GO) process and function terms (37) is being continuously maintained by the GO Editorial Office at the EBI. An updated file is at http://www.geneontology.org/external2go/metacyc2go.
Links to other databases
During the last 2 years we have added extensive links from MetaCyc to PubChem and to KEGG. In version 15.5 of MetaCyc there are 4014 reactions that contain links to KEGG reactions. MetaCyc compounds contain 4449 links to KEGG compounds, 8814 links to PubChem compounds and 3800 links to ChEBI compounds.
EXPANSION OF BIOCYC
The BioCyc databases are organized into three tiers.
Tier 1 PGDBs have received at least 1 year of manual curation. While some Tier 1 PGDBs (e.g. MetaCyc and EcoCyc) received decades of manual curation and are updated continuously, others are less well curated and are still in need of significant curation.
Tier 2 PGDBs have received moderate amounts of review (<1 year), and may or may not be updated on an ongoing basis and
Tier 3 PGDBs were created computationally, and received no subsequent manual review or updating.
During the past 2 years, the number of BioCyc PGDBs increased from 508 (version 13.1) to 1129 (version 15.1). Version 15.5, to be released in October 2011, will include >1700 PGDBs. The PGDBs AraCyc (A. thaliana col, curated by PMN) and YeastCyc (S. cerevisiae, curated by SGD) have been promoted from Tier 2 to Tier 1 status, and the PGDB HumanCyc (Homo sapiens, curated by SRI) will be upgraded to Tier 1 starting with release 15.5, bringing the total of Tier 1 PGDBs to five (along with EcoCyc and MetaCyc). As of version 15.1, Tier 2 includes 32 PGDBs, and Tier 3 includes 1093 PGDBs. Some Tier 2 PGDBs were provided by groups outside SRI. Database authors are identified on the database summary page (Tools → Reports → Summary Statistics).
SOFTWARE AND WEBSITE ENHANCEMENTS
The following paragraphs describe significant enhancements to Pathway Tools and to the BioCyc website during the past 2 years.
Web groups—sharing and analysis of object groups
Starting in July 2011, BioCyc includes a new feature called Web Groups, that extends the web-based interface to allow end users to create, share and compute with collections of Pathway Tools objects (Figures 1–3). Web Groups are a step in the direction of making Pathway Tools a platform for collaborative computing and knowledge sharing.
A Web Group is a spreadsheet-like structure that can contain both Pathway Tools objects and other values such as numbers or strings. Like a spreadsheet, it is organized by rows and columns. The typical group contains a set of Pathway Tools objects in the first column (e.g. a set of genes generated by a search). The other columns contain properties of the object (e.g. the chromosome position of each gene), or the result of a transformation (e.g. the reactions catalyzed by the gene products, or the corresponding genes from a different organism). The system provides 35 built-in transformations, each of which applies to a specific type of object. Example transformations include: transform a group of genes into the group of pathways containing that gene, or into the group of all genes that regulate the expression of those genes; transform a group of pathways into a group of all metabolites that are substrates within the pathway. The transformations can be applied to columns other than the first, creating a spreadsheet-like cascade.
Web Groups can be created from search result sets, by importing data from external spreadsheets or text files, and by adding objects individually from either their web pages or from the group itself. They can be exported to spreadsheets, and group columns of the appropriate types can be exported to the cellular overview. Web Groups can be shared publicly, or with selected other users.
The Web Groups interface also allows users to apply an enrichment/depletion analysis to the contents of a group (Figure 3). Enrichment/depletion analysis enables users to evaluate over- or under-representation of certain qualities or traits within an object group—for example, determining which genes out of a specified gene group are involved in one or more Gene Ontology categories. To enable this type of analysis, Pathway Tools includes a statistical analysis engine that can be applied to the content of groups. Performing enrichment analysis on a group results in creation of a new group that contains the analysis results.
Example use cases for Groups:
Users are interested in genes of the trp operon. They perform a search for genes containing the string ‘trp’, and turn the results into a group. Some of the gene names do not seem to contain that string, so the users add a column for the gene synonyms to see why they matched. After doing that, the users can see that some do not belong (e.g. the ribB gene matched because of the synonym ‘htrP’), so they delete that row from the group table. They then use a transformation from genes to their products, adding a column with the gene products; a second transformation adds a column containing the reactions that the products catalyze (Figure 1). Next they use additional transformations to obtain the substrates involved in those reactions, to create a new group from those substrates, and to add the molecular structures (Figure 2).
The users have obtained an essential gene list from experimental investigations. They can define a Web Group containing those essential genes, and use group operations to highlight the genes on the cellular overview to view its metabolic pathway distribution, or use enrichment analysis to determine over-represented GO categories.
The users have obtained a set of metabolites of interest from a metabolomics experiment. They can perform an enrichment analysis to determine over-represented metabolic pathways in that group.
New web cellular overview
We have re-engineered the web-based metabolic map diagrams available via Pathway Tools (38). As for the desktop version of Pathway Tools, the new web versions of these diagrams are organism specific, capturing the unique metabolic pathway complement of each organism, and are created by automatic layout algorithms (Figure 4). The diagrams are zoomable and queryable; users can search for metabolic entities (e.g. metabolites, enzymes and pathways) by various criteria such as by name and by EC number. Search results are highlighted on the diagram to indicate their locations. An omics viewer mode allows the diagram to be painted with large-scale data sets such as gene-expression, metabolomics and reaction flux data. Such displays can be animated (for data sets containing multiple time points), and are still zoomable. Omics data can be painted programmatically using web services (38), and bookmarks can be generated to save highlighting patterns for later use. Extensive tooltips are provided to identify metabolites, reactions and pathways within the diagram on mouse rollover.
Generation of flux-balance models from PGDBs
Pathway Tools now has the ability to generate genome-scale flux-balance analysis (FBA) models from PGDBs. Our goals for this effort were to accelerate FBA model development, and to streamline the interpretation of modeling results. We achieved those goals in several ways.
In our approach, the PGDB is both a database and an executable model. Therefore, the user can query, browse and edit the metabolic model within the PGDB using the many interactive features of Pathway Tools (such as reaction and pathway editors). The user programmatically generates from the PGDB the set of linear equations that comprise the FBA model, and Pathway Tools invokes the SCIP (40) linear solver to solve those equations, and then obtains the results via the SCIP API.
Since the FBA modeling is tightly integrated with Pathway Tools, the user does not need to directly invoke the linear solver, nor inspect its output files; Pathway Tools can paint the resulting fluxes onto the Cellular Overview for visual analysis. In addition, Pathway Tools guides the user in producing a complete functional model that produces all metabolites in the biomass equation.
We have developed special capabilities within Pathway Tools for accelerating the development of FBA models using a multiple-gap-filling approach. Using past techniques, FBA models typically had development times on the order of 1 year because metabolic network models are always incomplete at the start of the model development process, and it is very time consuming to determine how to extend the model to become functional. Using the new Pathway Tools functionality, we were able to build FBA models for the EcoCyc and HumanCyc PGDBs in ∼1 month each. Pathway Tools uses a meta-optimization approach to simultaneously suggest a minimal number of alternative types of model modifications to optimize the number of metabolites in the biomass equation that the FBA model is able to produce. The software suggests new reactions to add to the model from MetaCyc, proposes reactions within the model whose directions should be reversed, and suggests additional nutrients and secreted compounds that can be added to the model. Furthermore, in contrast to other existing tools, when metabolites cannot be produced by the model, Pathway Tools identifies those compounds, allowing the user to focus model debugging efforts on specific metabolites.
The Pathway Tools FBA module also supports evaluation of single and multiple gene and reaction knock-outs; genes or reactions whose removal prevents production of any biomass component are judged to be essential. The FBA module is available only in the desktop mode of Pathway Tools, and is not accessible via Pathway Tools based websites.
Dead-end metabolite finder
The ability to identify dead-end metabolites is a valuable method for identifying errors and incompleteness in a metabolic network, for FBA modeling and other applications. Dead-end metabolites are compounds that are only produced by, or only consumed by, the metabolic network of an organism. Although such situations sometimes reflect the correct biology, they usually indicate errors in the metabolic model. A tool for identifying dead-end metabolites is available in both web (Tools → Dead End Metabolites) and desktop modes.
Choke-point finder
Metabolic choke points are metabolites that are either produced by only a single reaction in the metabolic network, or are consumed by only one reaction in the network, and were found to be enriched for anti-microbial drug _targets (41). A tool for identifying metabolic choke points is available in both web (Tools → Chokepoint Reactions) and desktop modes.
Web services
Web services allow programs to query structured data from websites, and invoke web computations. Starting with version 14.5 (Fall 2010) Pathway Tools based websites provide a number of web services (see http://biocyc.org/web-services.shtml) including
Retrieving XML-structured information about individual genes, pathways, reactions, metabolites and so on,
Performing _targeted queries that return XML results, such as retrieving all of the genes or metabolites within a metabolic pathway,
Executing queries in the BioVelo (42) language against a PGDB,
Highlighting sets of objects in the Cellular Overview and
Displaying omics data on the Cellular Overview, on a table of pathways, and on individual pathways.
BioCyc ortholog data
BioCyc makes extensive use of ortholog data. Examples for ortholog use in BioCyc include local alignment of a chromosome region in a multi-genome browser, an option to show the ortholog of a gene or a protein in another organism by selecting the command ‘Gene (Protein) → Show This Gene (Protein) in Another Database’, and an editor that allows propagation of annotations from one PGDB to another, across multiple genes, based on orthology. Starting with version 15.0, BioCyc ortholog information is computed in house by running NCBI BLAST pair-wise searches between all proteomes of all PGDBs. We consider orthologs as genes that are likely to be counterparts of one another in two different organisms because they are the most closely related in this pair of organisms, and we define two proteins as orthologs if they are the bi-directional best BLAST hits of one another.
Combined gene/protein/RNA pages
BioCyc and other Pathway Tools based websites previously generated separate information pages for genes and their products. However, we merged these two pages into a single page because it was confusing to users to remember which information was contained in which page, and some users never realized that both types of pages existed. Thus, a single page now provides information about genes and their protein or RNA products.
New monoisotopic mass data and search
To facilitate analysis of metabolomics data in BioCyc, we augmented the compound search form on our website to allow searching for a list of monoisotopic molecular weight values, of the type produced by high-resolution mass spectrometry (starting with release 14.5). The search can be accessed from the menu item Search->Compounds (Figure 5) and allows changing the tolerance in ppm increments The search results are presented in a table that allows easy linking to compound pages, to simplify the identification of plausible candidates for each weight value.
Organism selection by taxonomy
One of the challenges in designing the BioCyc website was to enable easy selection of a PGDB of interest from the large number of available databases. Previously the only selection mechanism was based on the name of the desired organism. Starting with version 15.5, it is possible to select a PGDB from BioCyc by browsing the organism taxonomy (Figure 6). In addition, the new selector window contains an option to display the Organism Summary page upon PGDB selection. This page provides background information about the PGDB such as an author list, the source for the sequence, the number and type of replicons that were used for creating the PGDB, the taxonomic lineage of the organism and relevant publications, as well as some statistics about the content of the database.
Ports to 64-bit Windows and 64-bit Macintosh platforms
We have ported Pathway Tools to the 64-bit Windows and Macintosh platforms. Henceforth, 32-bit versions of Pathway Tools will not be available for those platforms.
Miscellaneous enhancements
We have made many improvements to the Pathway Tools pathway layout algorithms to improve the aesthetics of pathway layouts. We have changed the color scales used in the omics viewers to improve them from a human factors perspective. We added a signaling pathway editor to Pathway Tools. We have made many performance improvements to the web mode of Pathway Tools.
How to learn more about MetaCyc and BioCyc
The BioCyc.org and MetaCyc.org websites provide several informational resources, including an online BioCyc guided tour (http://biocyc.org/samples.shtml), a guide to the BioCyc database collection (http://biocyc.org/BioCycUserGuide.shtml), a guide for MetaCyc (http://www.metacyc.org/MetaCycUserGuide.shtml), a guide for EcoCyc (http://biocyc.org/ecocyc/EcoCycUserGuide.shtml), a Pathway/Genome Database Concepts Guide (http://biocyc.org/PGDBConceptsGuide.shtml) and many webinar videos that combine narration with online demonstration of different topics (http://biocyc.org/webinar.shtml). We routinely host workshops and tutorials (on site and at conferences) that provide training and in-depth discussion of our software for beginning and advanced users. To stay informed about recent changes and enhancements to our software, join the BioCyc mailing list at http://biocyc.org/subscribe.shtml. A list of our publications is available online (http://biocyc.org/publications.shtml).
FUTURE PLANS
A variety of additional enhancements are planned. We are currently working on adding reaction atom mappings to MetaCyc and other PGDBs. Plans include the addition of many more genomes to BioCyc, including those from the Human Microbiome Project, and the addition of more types of data to BioCyc PGDBs, such as predicted GO terms and protein localizations.
DATABASE AVAILABILITY
The MetaCyc and BioCyc databases are freely and openly available to all. See http://biocyc.org/download.shtml for download information. New versions of the downloadable data files and of the BioCyc and MetaCyc websites are released four times per year.
FUNDING
National Institute of General Medical Sciences of the National Institutes of Health (grants GM080746, GM077678, GM088849 and GM075742); Department of Energy (bioenergy-related pathway curation, grant DE-SC0004878); National Science Foundation (MetaCyc curation performed by the Plant Metabolic Network, grants IOS-1026003 and DBI-0640769). Funding for open access charge: A grant from the National Institute of General Medical Sciences of the National Institutes of Health (NIH).
Conflict of interest statement. None declared.
ACKNOWLEDGEMENTS
This article was prepared as an account of work sponsored by an agency of the US Government. Neither the US Government nor any agency thereof, nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe on privately owned rights. Reference herein to any specific commercial product, process or service by trade name, trademark, manufacturer or otherwise does not necessarily constitute or imply its endorsement, recommendation or favoring by the US government or any agency thereof. The views and opinions of authors expressed herein do not necessarily state or reflect those of the US Government or any agency thereof.
Comments