Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov;18(11):1752-62.
doi: 10.1101/gr.080663.108. Epub 2008 Aug 5.

Evolution of the mammalian transcription factor binding repertoire via transposable elements

Affiliations

Evolution of the mammalian transcription factor binding repertoire via transposable elements

Guillaume Bourque et al. Genome Res. 2008 Nov.

Abstract

Identification of lineage-specific innovations in genomic control elements is critical for understanding transcriptional regulatory networks and phenotypic heterogeneity. We analyzed, from an evolutionary perspective, the binding regions of seven mammalian transcription factors (ESR1, TP53, MYC, RELA, POU5F1, SOX2, and CTCF) identified on a genome-wide scale by different chromatin immunoprecipitation approaches and found that only a minority of sites appear to be conserved at the sequence level. Instead, we uncovered a pervasive association with genomic repeats by showing that a large fraction of the bona fide binding sites for five of the seven transcription factors (ESR1, TP53, POU5F1, SOX2, and CTCF) are embedded in distinctive families of transposable elements. Using the age of the repeats, we established that these repeat-associated binding sites (RABS) have been associated with significant regulatory expansions throughout the mammalian phylogeny. We validated the functional significance of these RABS by showing that they are over-represented in proximity of regulated genes and that the binding motifs within these repeats have undergone evolutionary selection. Our results demonstrate that transcriptional regulatory networks are highly dynamic in eukaryotic genomes and that transposable elements play an important role in expanding the repertoire of binding sites.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Limited evolutionary conservation of transcription factor binding regions. (A) Gray bars show the percentage of binding regions that are conserved based on either an overlap with a phastCons conserved element (left panel) or the presence of a conserved binding motif (right panel). ESR1 is the ESR1 ChIP-paired-end diTag (ChIP-PET) data set (Lin et al. 2007) while ESR1-CC is the ESR1 ChIP-chip data set (Carroll et al. 2006). Conservation levels expected by chance are shown in white and are computed from simulated binding data sets (see Methods). (B) Gray bars show the percentage of binding regions for ESR1, MYC, and CTCF that have a conserved binding motif where the regions are further partitioned into four categories: adjacent (within 250 bp of the coding region of a gene), proximal (within 5 kbp of a coding region), distant (intragenic or within 100 kbp of a gene), or desert (>100 kbp from any gene). Conservation levels expected by chance are shown in white. Error bars, 1 SD.
Figure 2.
Figure 2.
Pervasive association between transcription factor binding regions and transposable elements. (A) Enrichment of specific repeat families in the binding regions of distinct transcription factors. Heatmap shows the percentage of instances of a specific family of repeats that is in excess (yellow) or in deficit (purple) as compared to expected levels. Values were computed for the seven binding data sets but also for background data sets (labeled with “-B”) consisting of only singleton PETs (for ChIP-PET), random selected affymetrix probes (for ChIP-chip), or singleton tags (for ChIP-sequencing [ChIP-Seq]). The specific repeats from the four repeat families showing enrichment are highlighted on the right. These four repeat families are: MIR (mammalian interspersed repeat, a SINE repeat), ERVK (mouse endogenous retrovirus K, an LTR repeat), ERV1 (human endogenous retrovirus 1, an LTR repeat), and B2 (a rodent-specific SINE repeat). (B) Two examples showing ChIP sequencing clusters detecting binding regions in repeat-rich genomic sequences. In the first example, the binding region is identified with three fragments from the POU5F1 ChIP-PET library and four fragments from the SOX2 ChIP-PET library. In the second example, only the tag density is shown for the CTCF ChIP-Seq library.
Figure 3.
Figure 3.
Transposable elements harbor progenitor sequences for ESR1, TP53, POU5F1-SOX2, and CTCF binding motifs. (A) The same regions of the repeats harbor sequence binding motifs and are observed to be bound by the transcription factor. Filled areas represent the number of instances, at a given position relative to the consensus sequence, observed to be bound by ESR1, TP53, POU5F1-SOX2, and CTCF, respectively. Similarly, the green, purple, red, and orange curves show the number of instances of the ESR1, TP53, POU5F1-SOX2, and CTCF motifs at a given position across all instances of that repeat in the genome. (B) Multiple sequence alignment of the 17 bound instances of the RLTR11B repeat. Columns with >90% identity are in blue and highlight two regions of high sequence similarity. The first region is where the POU5F1-SOX2 motif (Loh et al. 2006) is detectable. Genomic positions of the repeat instances are shown on the right.
Figure 4.
Figure 4.
Evolution of the mammalian transcription factor binding repertoire via transposable elements. (A) Two evolutionary models for the gain of transcription factor binding sites: (1) via point mutations only or (2) by the insertion of a transposable element in which the seed of a binding motif is embedded. (B) Overlaying the age of the repeats on the species tree determines the age of the RABS. The time scale is in millions of years and divergence times are from Murphy et al. (2007). (C) RABS constitute a large fraction of the nonconserved binding regions of TP53, POU5F1-SOX2, and CTCF. Venn diagrams show the number of conserved and nonconserved binding regions that also correspond to RABS. The CTCF binding regions, which were detected in mouse embryonic stem cells, are also compared to a set of CTCF binding regions detected in human T cells (Barski et al. 2007).
Figure 5.
Figure 5.
Transposable elements are enriched for bound motifs and are associated with regulated genes. (A) Ratio between the fraction of motifs within a given repeat subfamily that is observed to be bound and the fraction of motifs that is expected to be bound. The x-axis represents the estimated age of the repeat subfamily (in millions of years). Two subfamilies of the B2 repeat associated with CTCF are highlighted: B3A and B2_Mm1a. (B) Gray bars indicate the percentage of ESR1 and POU5F1-SOX2 binding regions with and without repeats that are within 10 kb of a regulated gene. Expected levels based on a random set of genes are shown in white. An additional control is shown using a random sample of instances from the same repeat families. Error bars, 1 SD.

Similar articles

Cited by

References

    1. Barski A., Cuddapah S., Cui K., Roh T.Y., Schones D.E., Wang Z., Wei G., Chepelev I., Zhao K. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–837. - PubMed
    1. Bejerano G., Lowe C.B., Ahituv N., King B., Siepel A., Salama S.R., Rubin E.M., Kent W.J., Haussler D. A distal enhancer and an ultraconserved exon are derived from a novel retroposon. Nature. 2006;441:87–90. - PubMed
    1. Bell A.C., West A.G., Felsenfeld G. The protein CTCF is required for the enhancer blocking activity of vertebrate insulators. Cell. 1999;98:387–396. - PubMed
    1. Birney E., Stamatoyannopoulos J.A., Dutta A., Guigo R., Gingeras T.R., Margulies E.H., Weng Z., Snyder M., Dermitzakis E.T., Thurman R.E., et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature. 2007;447:799–816. - PMC - PubMed
    1. Boffelli D., Nobrega M.A., Rubin E.M. Comparative genomics at the vertebrate extremes. Nat. Rev. Genet. 2004;5:456–465. - PubMed

Publication types

  NODES
Association 4
chat 2
innovation 2
Project 1
twitter 2