Abstract
Vertebrate genomic DNA is generally CpG depleted1,2, possibly because methylation of cytosines at 80% of CpG dinucleotides results in their frequent mutation to thymine, and thus CpG to TpG dinucleotides3. There are, however, genomic regions of high G+C content (CpG islands), where the occurrence of CpGs is significantly higher, close to the expected frequency, whereas the methylation concentration is significantly lower than the overall genome4. CpG islands5 are longer than 200 bp and have over 50% of G+C content and CpG frequency, at least 0.6 of that statistically expected. Approximately 50% of mammalian gene promoters are associated with one or more CpG islands6. Although biologists often intuitively use CpG islands for 5′ gene identification7,8, this has not been rigorously quantified9. We have determined the features that discriminate the promoter-associated and non-associated CpG islands. This led to an effective algorithm for large-scale promoter mapping (with 2-kb resolution) with a concentration of false-positive predictions of promoters much lower than previously obtained. Using this algorithm, we correctly discriminated approximately 85% of the CpG islands within an interval (−500 to +1500) around a transcriptional start site (TSS) from those that lie further away from TSSs. We also correctly mapped approximately 93% of the promoters containing CpG islands.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
We are sorry, but there is no personal subscription option available for your country.
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Bird, A.P. DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res. 8, 1499–1504 (1980).
Jones, P.A., Rideout, W.M. 3d, Shen, J.C., Spruck, C.H. & Tsai, Y.C. Methylation, mutation and cancer. Bioessays 14, 33–36 (1992).
Bird, A. DNA methylation de novo. Science 286, 2287–2288 (1999).
Antequera, F. & Bird, A. CpG islands. EXS 64, 169–185 (1993).
Gardiner-Garden, M. & Frommer, M. CpG islands in vertebrate genomes. J. Mol. Biol. 196, 261–282 (1987).
Antequera, F. & Bird, A. Number of CpG islands and genes in human and mouse. Proc. Natl Acad. Sci. USA 90, 11995–11999 (1993).
Cross, S.H. & Bird, A.P. CpG islands and genes. Curr. Opin. Genet. Dev. 5, 309–314 (1995).
Dunham, I. et al. The DNA sequence of human chromosome 22. Nature 402, 489–495 (1999).
Pedersen, A.G., Baldi, P., Chauvin, Y. & Brunak, S. The biology of eukaryotic promoter prediction—a review. Comput. Chem. 23, 191–207 (1999).
Venables, W.N. & Ripley, B.D. Modern Applied Statistics with S-Plus (Springer, New York, 1994).
McLachlan, G.J. Discriminant Analysis and Statistical Pattern Recognition (Wiley, New York, 1992).
Prestridge, D.S. Predicting Pol II promoter sequences using transcription factor binding sites. J. Mol. Biol. 249, 923–932 (1995).
Toyota, M. & Issa, J.P. CpG island methylator phenotypes in aging and cancer. Semin. Cancer Biol. 9, 349–357 (1999).
Baylin, S.B. & Herman, J.G. DNA hypermethylation in tumorigenesis: epigenetics joins genetics. Trends Genet. 16, 168–174 (2000).
Barlow, D.P. Gametic imprinting in mammals. Science 270, 1610–1613 (1995).
Singer-Sam, J. & Riggs, A.D. X chromosome inactivation and DNA methylation. EXS 64, 358–384 (1993).
Larsen, F., Gundersen, G., Lopez, R. & Prydz, H. CpG islands as gene markers in the human genome. Genomics 13, 1095–1107 (1992).
Cross, S.H., Charlton, J.A., Nan, X. & Bird, A.P. Purification of CpG islands using a methylated DNA binding column. Nature Genet. 6, 236–244 (1994).
Cross, S.H., Clark, V.H. & Bird, A.P. Isolation of CpG islands from large genomic clones. Nucleic Acids Res. 27, 2099–2107 (1999).
Zhang, M.Q. in Proceedings of Pacific Symposium on Biocomputing 1998 (eds Altman, R.B. et al.) 240–251 (World Scientific, Singapore, 1998).
Zhang, M.Q. Identification of protein coding regions in the human genome based on quadratic discriminant analysis. Proc. Natl Acad. Sci. USA 94, 565–568 (1997).
Zhang, M.Q. Statistical features of human exons and their flanking regions. Hum. Mol. Genet. 7, 919–932 (1998).
Acknowledgements
We thank R. Bari for assistance in sequence annotation; T. Zhang for assistance in testing CpG_promoter; S.H. Cross for discussions; P. Rice and R. Lopez for consultations about the EMBOSS project and CpGPlot program; and J. Locker and S. Emmons for editing of the text. This work was supported by National Institutes of Health Grant HG01696 to M.Q.Z.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ioshikhes, I., Zhang, M. Large-scale human promoter mapping using CpG islands. Nat Genet 26, 61–63 (2000). https://doi.org/10.1038/79189
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/79189
This article is cited by
-
Epigenetic analyses in forensic medicine: future and challenges
International Journal of Legal Medicine (2024)
-
A successful hybrid deep learning model aiming at promoter identification
BMC Bioinformatics (2022)
-
Dynamic changes in hepatic DNA methylation during the development of nonalcoholic fatty liver disease induced by a high-sugar diet
Journal of Physiology and Biochemistry (2022)
-
Construction and characterization of EGFP reporter plasmid harboring putative human RAX promoter for in vitro monitoring of retinal progenitor cells identity
BMC Molecular and Cell Biology (2021)
-
Methylation-driven model for analysis of dinucleotide evolution in genomes
Theoretical Biology and Medical Modelling (2020)