Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

The support of human genetic evidence for approved drug indications

Abstract

Over a quarter of drugs that enter clinical development fail because they are ineffective. Growing insight into genes that influence human disease may affect how drug _targets and indications are selected. However, there is little guidance about how much weight should be given to genetic evidence in making these key decisions. To answer this question, we investigated how well the current archive of genetic evidence predicts drug mechanisms. We found that, among well-studied indications, the proportion of drug mechanisms with direct genetic support increases significantly across the drug development pipeline, from 2.0% at the preclinical stage to 8.2% among mechanisms for approved drugs, and varies dramatically among disease areas. We estimate that selecting genetically supported _targets could double the success rate in clinical development. Therefore, using the growing wealth of human genetic data to select the best _targets and indications should have a measurable impact on the successful development of new drugs.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Summary of data resources and mappings between them.
Figure 2: Enrichment of _target genes for drugs approved in the United States or the European Union.
Figure 3: Overlap between drug _targets and their indications with genetic associations for similar traits.

Similar content being viewed by others

References

  1. DiMasi, J.A., Feldman, L., Seckler, A. & Wilson, A. Trends in risks associated with new drug development: success rates for investigational drugs. Clin. Pharmacol. Ther. 87, 272–277 (2010).

    Article  CAS  Google Scholar 

  2. Arrowsmith, J. & Miller, P. Trial watch: phase II and phase III attrition rates 2011–2012. Nat. Rev. Drug Discov. 12, 569 (2013).

    Article  CAS  Google Scholar 

  3. Cook, D. et al. Lessons learned from the fate of AstraZeneca's drug pipeline: a five-dimensional framework. Nat. Rev. Drug Discov. 13, 419–431 (2014).

    Article  CAS  Google Scholar 

  4. Morgan, P. et al. Can the flow of medicines be improved? Fundamental pharmacokinetic and pharmacological principles toward improving Phase II survival. Drug Discov. Today 17, 419–424 (2012).

    Article  CAS  Google Scholar 

  5. Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014).

    Article  CAS  Google Scholar 

  6. Plenge, R.M., Scolnick, E.M. & Altshuler, D. Validating therapeutic _targets through human genetics. Nat. Rev. Drug Discov. 12, 581–594 (2013).

    Article  CAS  Google Scholar 

  7. Kathiresan, S. et al. Common variants at 30 loci contribute to polygenic dyslipidemia. Nat. Genet. 41, 56–65 (2009).

    Article  CAS  Google Scholar 

  8. Okada, Y. et al. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506, 376–381 (2014).

    Article  CAS  Google Scholar 

  9. Sanseau, P. et al. Use of genome-wide association studies for drug repositioning. Nat. Biotechnol. 30, 317–320 (2012).

    Article  CAS  Google Scholar 

  10. Li, M.J. et al. GWASdb: a database for human genetic variants identified by genome-wide association studies. Nucleic Acids Res. 40, D1047–D1054 (2012).

    Article  CAS  Google Scholar 

  11. Wang, Z.Y. & Zhang, H.Y. Rational drug repositioning by medical genetics. Nat. Biotechnol. 31, 1080–1082 (2013).

    Article  CAS  Google Scholar 

  12. Hopkins, A.L. & Groom, C.R. The druggable genome. Nat. Rev. Drug Discov. 1, 727–730 (2002).

    Article  CAS  Google Scholar 

  13. Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

    Article  CAS  Google Scholar 

  14. McInnes, B.T., Pedersen, T. & Pakhomov, S.V. UMLS-Interface and UMLS-Similarity: open source software for measuring paths and semantic similarity. AMIA Annu. Symp. Proc. 2009, 431–435 (2009).

    PubMed  PubMed Central  Google Scholar 

  15. Patsopoulos, N.A. et al. Genome-wide meta-analysis identifies novel multiple sclerosis susceptibility loci. Ann. Neurol. 70, 897–912 (2011).

    Article  CAS  Google Scholar 

  16. 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).

  17. Schadt, E.E. et al. Mapping the genetic architecture of gene expression in human liver. PLoS Biol. 6, e107 (2008).

    Article  Google Scholar 

  18. Boyle, A.P. et al. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 22, 1790–1797 (2012).

    Article  CAS  Google Scholar 

  19. Maurano, M.T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  CAS  Google Scholar 

  20. Davis, A.P. et al. The Comparative Toxicogenomics Database: update 2013. Nucleic Acids Res. 41, D1104–D1114 (2013).

    Article  CAS  Google Scholar 

  21. Resnik, P. Semantic similarity in a taxonomy: an information-based measure and its application to problems of ambiguity in natural language. J. Artif. Intell. Res. 11, 95–130 (1999).

    Article  Google Scholar 

  22. Lin, D. in Proc. Int. Conf. Machine Learning 296–304 (Morgan Kaufmann Publishers, 1998).

  23. R Development Core Team. R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2014).

  24. Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).

Download references

Acknowledgements

We would like to thank N. Srivastava for much of the manual mapping of GWASdb traits and Pharmaprojects indications to MeSH terms and P. Agarwal for many helpful conversations. GWASdb-related work was supported by the Research Grants Council, Hong Kong SAR, China (781511M, 17121414M) and the National Natural Science Foundation of China (91229105).

Author information

Authors and Affiliations

Authors

Contributions

This work was conceived by M.R.N., H.T., L.R.C., J.C.W. and P.S. The primary analyses were designed and conducted by M.R.N. Supporting analyses were provided by J.L.P. and J.S. The mapping of variants to genes was conducted by M.R.N., P.N., Y.S. and A.F. The GWASdb data were created and provided by P.C.S., M.J.L. and J.W. The manuscript was written by M.R.N. with contributions from H.T., J.C.W. and P.S.

Corresponding author

Correspondence to Matthew R Nelson.

Ethics declarations

Competing interests

M.R.N., H.T., J.L.P., J.S., L.R.C., J.C.W. and P.S. are employees of GlaxoSmithKline, a global healthcare company, that may conceivably benefit financially through this publication.

Integrated supplementary information

Supplementary Figure 1 Summary of genetic association data and their traits and gene mappings.

Distribution of the (a) number of publications or sources and (b) reported associations for each unique MeSH term. (c) Distribution of the number of genes mapped for each MeSH term. (d) Distribution of the number of genes mapped to each SNP (excluding SNPs with no genes mapped; n = 5,272). (e) Distribution of P values for all unique associations. These summaries are limited to publications and sources with at least one association with a P value ≤1 × 10–8. Panels b and c were truncated at 50, panel d was truncated at 30 and panel e was truncated at 100. All values over those thresholds are shown at the maximum value. (f) Distribution of the number of genes for each unique MeSH term in OMIM.

Supplementary Figure 2 Summary of the drug data and their _target gene and indications.

Distribution of the (a) number of drugs observed for each _target gene in the analysis data set (truncated at 50). (b) Distribution of the number of _target genes for each drug (i.e., multiple drug _targets or combinations of therapeutic agents). (c) Distribution of the number of MeSH terms (i.e., unique indications) for each drug (truncated at 15). (d) Distribution of the number of drugs listed for each MeSH term (truncated at 100). (e) Distribution of the number of _target genes for each MeSH term (truncated at 100). (f) Distribution of the number of MeSH terms for each _target gene (truncated at 50).

Supplementary Figure 3 Illustrated use of the MeSH ontology to estimate relative similarity.

The methods used in this study (lin and resnik, implemented in UMLS::Similarity) combined both path length and information content. See the Online Methods for additional details.

Supplementary Figure 4 Overlap of drug _targets with genetic associations by disease category and latest development phase.

(a) Overlap between drug _targets and their indications with genetic associations for similar traits. The percentage of _target-indication pairs overlapping with gene-trait combinations from GWASdb or OMIM for the latest development phase each pair achieved as recorded in Pharmaprojects. The number of unique _target-indication pairs for each category at each phase is shown to the right of each plot. Exact 95% confidence intervals are shown. (b) Distribution of the number of _target-indication pairs at each phase by category.

Supplementary Figure 5 Overlap between drug _targets and their indications with genetic associations for similar traits with genetic associations restricted to GWASdb only.

Overlap for (a) drugs approved in the United States or European Union and (b) the furthest development phase to which each _target-indication pair progressed. Exact 95% confidence intervals are shown.

Supplementary Figure 6 Overlap between drug _targets and their indications with genetic associations for similar traits with genetic associations restricted to OMIM only.

Overlap for (a) drugs approved in the United States or European Union and (b) the furthest development phase to which each _target-indication pair progressed. Exact 95% confidence intervals are shown.

Supplementary Figure 7 Tradeoff between the number of indications studied and overall genetic support.

The tradeoff between the number of indications studied and overall genetic support when setting a lower bound on the number of independent genes associated with a trait related to each indication (relative similarity ≥ 0.7), restricted to drugs approved in the United States or European Union. The percentage of _target-indication pairs with genetic support increases as indications are restricted to those with the most genetic information available, although at the cost of considering far fewer indications. The analyses reported in Figure 3 and Supplementary Figures 4,5,6 selected five as the threshold, where the first enrichment plateau is observed.

Supplementary Figure 8 Distribution of the number of genes associated with traits similar (≥0.7) to the indications included in the analysis of overlap with genetic associations.

The few indications with very large numbers of genes associated were truncated at 50. (The full range is available in Supplementary Table 5.) The box corresponds to the interquartile range, the center line corresponds to the median, the whisker correspond to the maximum or 1.5 times the interquartile range (whichever is largest) and the points identify further outliers. The numbers given on the y axis are the number of unique indications observed in each phase. There is no statistically significant variability among phases (P = 0.18); analysis of the rank of the number of associations with phase as ordered variable) or with the linear trend (P = 0.37; analysis of rank of number of associations with phase as numeric with 1 = preclinical and 5 = approved in the United States or European Union).

Supplementary Figure 9 System for scoring the strength of evidence tying a variant with a phenotypic association to a gene.

For variant function, “DHS Rdb 3” indicates that the variant has a RegulomeDB score of 3 and falls within a proximal or distal DHS site, “eQTL or DHS (2)” indicates that the variant was either identified as an eQTL in the University of Chicago eQTL database or had a RegulomeDB score of 2 and “eQTL & DHS” indicates that the variant was both identified as an eQTL and fell within a DHS site with a RegulomeDB score of 2 or less. LD is in the form of r2.

Supplementary Figure 10 Permutation test of overlap between approved drug _target–indications and genetic evidence (GWASdb or OMIM).

(a) The permutation scheme to simulate the null distribution. (b) The distribution of the percent of gene-trait and _target-indication pairs that overlap over 10,000 permutations and the overlap observed in the original data (red downward arrow). (c) The overlap observed in the original data overall and by disease category (red points) and the median percent overlap over 10,000 permutations (red ×).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–10 and Supplementary Note. (PDF 2117 kb)

Supplementary Table 1: Count of publications, associations and genes corresponding to each MeSH term.

MeSH: unique MeSH terms mapped to GWASdb traits. Publications: the number of publications or unique data sources reporting associations with the MeSH term. Associations: the number of unique SNPs reported to be associated with the MeSH term. Genes: the number of unique genes to which the SNPs associated with the MeSH term are mapped. (XLSX 26 kb)

Supplementary Table 2: Count of drugs and genes mapping to each MeSH term.

MeSH: unique MeSH terms mapped to indications. Drugs: the number of drugs in Pharmaprojects with the MeSH term as an indication. Genes: the number of genes reported to be _targets for the drugs with the MeSH term as an indication. (XLSX 30 kb)

Supplementary Table 3: Count of drugs and MeSH terms corresponding to each drug _target gene.

Gene: unique drug _target genes. Drugs: the number of drugs reported to _target the gene product. MeSH: the number of MeSH terms for which the drugs _targeting the gene product are indicated. (XLSX 51 kb)

Supplementary Table 4: Count of _target genes and MeSH terms corresponding to each drug.

Drug: unique drugs. Genes: the number of genes reported to be _targets for the drug. MeSH: the number of MeSH terms indicated for the drug. (XLSX 599 kb)

Supplementary Table 5: All indications for drugs with reported human drug _targets with the number of genes associated with them.

MSH.Ind: drug indication (mapped to the best MeSH term). N.Traits: the number of indications or traits with a relative similarity ≥0.7 to the indication. N.Assns: the number of independent associations reported for the indication or a similar trait. Traits: list of traits with a relative similarity ≥0.7 to the indication; may be other indications without a corresponding genetic trait.lApprovedUS.EU: logical indication of whether the indication is for a drug approved in the United States or European Union. (XLSX 45 kb)

Supplementary Table 6: Drug _targets with a genetic association (GWASdb or OMIM) mapped to the same gene for a trait with relative similarity ≥0.7 to the corresponding indication.

Gene: drug _target gene. MSH.Ind: drug indication (mapped to the best MeSH term). Category: indication disease category. MSH.Trt: MeSH term for the trait with a genetic association to the drug _target gene. pvalue: the P value for the genetic association. If multiple associations mapped the same gene to the same trait, the smallest P value with the largest gene score was selected. P values of zero indicate that the association is from OMIM. eCat: extension of the RegulomeDB category. If the variant mapping the association to the listed gene was not due to a DHS-correlated enhancer, a value of "0" was assigned to amino acid–changing variants, "2" was assigned for eQTLs and "9" was assigned otherwise. Associations from OMIM were not assigned a value. Rank: the rank of the mapping of the given gene for the reported association. RelSim: relative similarity between the indication and trait. LatestPhase: the latest phase that each unique gene-indication pair achieved in Pharmaprojects. (XLSX 2587 kb)

Supplementary Table 7: Counts of _target-indication pairs included in the analyses presented in Figure 3b and used to estimate the enrichment of genetic associations.

Phase.Latest: latest phase in the development pipeline to which each _target-indication combination has progressed. Num._targets: total number of _target-indication combinations progressing to that phase. N.Overlap: number of _target-indication combinations that overlap with a gene-trait association. Percent: percent of _target-indication combinations that overlap with a gene-trait association. Genetic.Evidence: the source of the genetic evidence. (XLSX 11 kb)

Supplementary Table 8: Manually scored MeSH term similarities to values of 0.9, 0.75 and 0.5 reflecting a subjective measure of similarity not captured via the ontological relationships.

MSH1: MeSH term 1. MSH2: MeSH term 2. ManSim: manually assigned similarity. RelSim: relative similarity averaging Resnik and Lin relative similarity measures. RelSim.Res: Resnik relative similarity. RelSim.Lin: Lin relative similarity. (XLSX 28 kb)

Supplementary Table 9: Manual assignment of traits or drug indications (MeSH terms) to disease categories.

MSH: MeSH term. MSH.Top: MeSH term for the top level of the MeSH hierarchy. Disease: the original disease trait. Indication: the original indication. Category: manually assigned trait or indication category. (XLSX 151 kb)

Supplementary Data Set 1: GWASdb entries with MeSH terms mapped for each trait and genes annotated as described in the Online Methods.

Disease: the name of the trait for the corresponding genetic association as provided by GWASdb, which is generally taken directly from the GWAS catalog or whatever source from which the association was derived. snp_id: the identifier (generally a dbSNP rs ID) of the SNP reported to be associated with disease. Link: the reference for the association. Most references are a PubMed ID for the published paper. pvalue: the P value reported for the association of snp_id with disease. Source: the origin of the association information. It may be the following: GWAS:A/B: results listed in the NHGRI GWAS Catalog. The publications the associations of GWAS:B are drawn from published tables and supplementary information. Omim: from Online Mendelian Inheritance in Man. All P values are zero. GWASCentral: from GWAS Central. dbGaP: associations from dbGaP. SNP.Trait.Cnt: the number of associations of the same snp_id with the same disease in the original data set. These have been reduced to a single row in this data set, and the minimum P value was selected. MSH: Medical Subject Heading for disease. Manually mapped by Computational Biology. MSH.Top: the MeSH term for the top level of the branch to which the trait is mapped. In most instances, there are many branches to which a single MSH may be mapped. When this occurs, the most common top-level term in GWASdb is selected. snp.ld: a SNP in linkage disequilibrium (LD) with snp_id that provides a plausible connection to a gene. Gene: a gene that snp.ld is within 5 kb of, is an eQTL for or sits in a DNase I hypersensitivity site that is correlated with, or is within the transcription start site of. r2: LD between snp_id and snp.ld. eqtl: indicates whether snp.ld is an eQTL for a gene. The eQTL data are drawn from eqtl.uchicago.edu. rdb: indicates whether snp.ld-gene mapping is the result of a DHS correlation (from Maurano et al. (2012), provided by J. Stamatoyannopoulos). Cat.rdb: RegulomeDB category of the SNP (if rdb is "yes"). Lower values indicate more lines of converging functional evidence. eCat: a derivative of Cat.rdb, filling in values where rdb is "no." If eqtl is "yes" but rdb is "no," then it gets a value of 2. If snp.ld is a missense variant (amino acid change), the value is 0. The value is 9 otherwise. AAEffect: amino acid effect of snp.ld. AAScore: Condel score from VEP for nonsynonymous variants. GeneScore: an overall assessment of the evidence that the associated variant has a causal effect on the gene in question, ranging from values of zero to eight. Higher scores imply higher weight of causal evidence. The contributions to GeneScore are summarized on a separate GeneScore Wiki page. Rank: the rank for the given gene for its strength of connection to snp_id. This takes LD and functional evidence into account. (TXT 12043 kb)

Supplementary Data Set 2: Reduction of the genetic association data to a single row per gene and trait as used for most analyses described.

Variable names are as given for Supplementary Data Set 1. (TXT 2903 kb)

Supplementary Data Set 3: Data set of unique _target-indication combinations in Pharmaprojects.

Gene: drug _target gene. MSH: MeSH term for drug indication. MSH.Top: top-level MeSH term. Phase.Latest: latest phase to which the _target-indication pair progressed through the development pipeline. lApprovedUS.EU: indicator of whether a drug for the _target-indication pair has been approved in the United States or European Union. (TXT 1485 kb)

Supplementary Data Set 4: Relative similarity matrix of MeSH terms.

Row and column names correspond to each MeSH term for which a relative similarity could be computed. (TXT 58752 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nelson, M., Tipney, H., Painter, J. et al. The support of human genetic evidence for approved drug indications. Nat Genet 47, 856–860 (2015). https://doi.org/10.1038/ng.3314

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3314

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research
  NODES
Association 44
INTERN 1
Note 1
Project 6
twitter 1