Methods for calculating the probabilities of finding patterns in sequences
- PMID: 2720468
- DOI: 10.1093/bioinformatics/5.2.89
Methods for calculating the probabilities of finding patterns in sequences
Abstract
This paper describes the use of probability-generating functions for calculating the probabilities of finding motifs in nucleic acid and protein sequences. Equations and algorithms are given for calculating the probabilities associated with nine different ways of defining motifs. Comparisons are made with searches of random sequences. A higher level structure--the pattern--is defined as a list of motifs. A pattern also specifies the permitted ranges of spacing allowed between its constituent motifs. Equations for calculating the expected numbers of matches to patterns are given.
Similar articles
-
Methods to define and locate patterns of motifs in sequences.Comput Appl Biosci. 1988 Mar;4(1):53-60. doi: 10.1093/bioinformatics/4.1.53. Comput Appl Biosci. 1988. PMID: 2898280
-
Calculating the exact probability of language-like patterns in biomolecular sequences.Proc Int Conf Intell Syst Mol Biol. 1998;6:17-24. Proc Int Conf Intell Syst Mol Biol. 1998. PMID: 9783205
-
Software tools for motif and pattern scanning: program descriptions including a universal sequence reading algorithm.Comput Appl Biosci. 1989 Jul;5(3):227-32. doi: 10.1093/bioinformatics/5.3.227. Comput Appl Biosci. 1989. PMID: 2766008
-
PROMOT: a FORTRAN program to scan protein sequences against a library of known motifs.Comput Appl Biosci. 1991 Apr;7(2):257-60. doi: 10.1093/bioinformatics/7.2.257. Comput Appl Biosci. 1991. PMID: 2059852
-
Discovering sequence motifs.Methods Mol Biol. 2007;395:271-92. doi: 10.1007/978-1-59745-514-5_17. Methods Mol Biol. 2007. PMID: 17993680 Review.
Cited by
-
Identification of muscle-specific regulatory modules in Caenorhabditis elegans.Genome Res. 2007 Mar;17(3):348-57. doi: 10.1101/gr.5989907. Epub 2007 Feb 6. Genome Res. 2007. PMID: 17284674 Free PMC article.
-
Identification of cis-regulatory elements in gene co-expression networks using A-GLAM.Methods Mol Biol. 2009;541:1-22. doi: 10.1007/978-1-59745-243-4_1. Methods Mol Biol. 2009. PMID: 19381547 Free PMC article. Review.
-
RhizoBindingSites, a Database of DNA-Binding Motifs in Nitrogen-Fixing Bacteria Inferred Using a Footprint Discovery Approach.Front Microbiol. 2020 Nov 5;11:567471. doi: 10.3389/fmicb.2020.567471. eCollection 2020. Front Microbiol. 2020. PMID: 33250866 Free PMC article.
-
Gibbs Recursive Sampler: finding transcription factor binding sites.Nucleic Acids Res. 2003 Jul 1;31(13):3580-5. doi: 10.1093/nar/gkg608. Nucleic Acids Res. 2003. PMID: 12824370 Free PMC article.
-
Efficient and accurate P-value computation for Position Weight Matrices.Algorithms Mol Biol. 2007 Dec 11;2:15. doi: 10.1186/1748-7188-2-15. Algorithms Mol Biol. 2007. PMID: 18072973 Free PMC article.