Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jan 22:8:19.
doi: 10.1186/1471-2105-8-19.

Statistical significance of cis-regulatory modules

Affiliations

Statistical significance of cis-regulatory modules

Dustin E Schones et al. BMC Bioinformatics. .

Abstract

Background: It is becoming increasingly important for researchers to be able to scan through large genomic regions for transcription factor binding sites or clusters of binding sites forming cis-regulatory modules. Correspondingly, there has been a push to develop algorithms for the rapid detection and assessment of cis-regulatory modules. While various algorithms for this purpose have been introduced, most are not well suited for rapid, genome scale scanning.

Results: We introduce methods designed for the detection and statistical evaluation of cis-regulatory modules, modeled as either clusters of individual binding sites or as combinations of sites with constrained organization. In order to determine the statistical significance of module sites, we first need a method to determine the statistical significance of single transcription factor binding site matches. We introduce a straightforward method of estimating the statistical significance of single site matches using a database of known promoters to produce data structures that can be used to estimate p-values for binding site matches. We next introduce a technique to calculate the statistical significance of the arrangement of binding sites within a module using a max-gap model. If the module scanned for has defined organizational parameters, the probability of the module is corrected to account for organizational constraints. The statistical significance of single site matches and the architecture of sites within the module can be combined to provide an overall estimation of statistical significance of cis-regulatory module sites.

Conclusion: The methods introduced in this paper allow for the detection and statistical evaluation of single transcription factor binding sites and cis-regulatory modules. The features described are implemented in the Search Tool for Occurrences of Regulatory Motifs (STORM) and MODSTORM software.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Converting between p-values and match scores using (g, k)-tables. Based on a sequence database, a (g, k)-table encodes sufficient information to calculate match score p-values, or produce a score cutoff corresponding to a given p-value. Details of this process are described in the text.
Figure 2
Figure 2
Max-gap cluster – module without organizational constraints. A max-gap cluster of motifs as a module without organizational constraints. The length of the sequence (l) is 16 bases, the number of motifs (m) is four, the widths of each motif (wi) is two bases and the max-gap (g) is three bases.
Figure 3
Figure 3
Max-gap cluster – module with organizational constraints. Max-gap cluster of motifs as a module with organizational constraints. There are four motifs in this module that must occur in the order A,B,C,D. The spacings between the motifs are defined as sx and the orientations of the motifs are labeled with arrows above the motifs.
Figure 4
Figure 4
IFN-β enhancer. A screen shot of the IFN-β enhancer found with MODSTORM as a track on the UCSC genome browser (Human Mar. 2006 (hg18) assembly [55]). The top track spans the entire length of the module and is labeled with the module significance. The individual motifs occurrences are shown below. The predicted location of this site is consistent with the experimentally verified site.

Similar articles

Cited by

References

    1. Webber A, Ingram R, Levorse J, Tilghman S. Location of enhancers is essential for the imprinting of H19 and Igf2 genes. Nature. 1998;391:711–715. doi: 10.1038/35781. - DOI - PubMed
    1. Leighton P, Saam J, Ingram R, Stewart C, Tilghman S. An enhancer deletion affects both H19 and Igf2 expression. Genes Dev. 1995;9:2079–2089. - PubMed
    1. Xuan Z, Zhao F, Wang J, Chen G, Zhang MQ. Genome-Wide Promoter Extraction and Analysis in Human, Mouse and Rat. Genome Biology. 2005. p. 6. - PMC - PubMed
    1. Staden R. Methods for calculating the probabilities of finding patterns in sequences. Computer Applications in the Biosciences. 1989;5:89–96. - PubMed
    1. Claverie JM. Some Useful Statistical Properties of Position-Weight Matrices. Computers Chem. 1994;18:287–294. doi: 10.1016/0097-8485(94)85024-0. - DOI - PubMed

Publication types

Substances

LinkOut - more resources

  NODES
twitter 2