Methods for calculating the probabilities of finding patterns in sequences
- PMID: 2720468
- DOI: 10.1093/bioinformatics/5.2.89
Methods for calculating the probabilities of finding patterns in sequences
Abstract
This paper describes the use of probability-generating functions for calculating the probabilities of finding motifs in nucleic acid and protein sequences. Equations and algorithms are given for calculating the probabilities associated with nine different ways of defining motifs. Comparisons are made with searches of random sequences. A higher level structure--the pattern--is defined as a list of motifs. A pattern also specifies the permitted ranges of spacing allowed between its constituent motifs. Equations for calculating the expected numbers of matches to patterns are given.
Similar articles
-
Methods to define and locate patterns of motifs in sequences.Comput Appl Biosci. 1988 Mar;4(1):53-60. doi: 10.1093/bioinformatics/4.1.53. Comput Appl Biosci. 1988. PMID: 2898280
-
Calculating the exact probability of language-like patterns in biomolecular sequences.Proc Int Conf Intell Syst Mol Biol. 1998;6:17-24. Proc Int Conf Intell Syst Mol Biol. 1998. PMID: 9783205
-
Software tools for motif and pattern scanning: program descriptions including a universal sequence reading algorithm.Comput Appl Biosci. 1989 Jul;5(3):227-32. doi: 10.1093/bioinformatics/5.3.227. Comput Appl Biosci. 1989. PMID: 2766008
-
PROMOT: a FORTRAN program to scan protein sequences against a library of known motifs.Comput Appl Biosci. 1991 Apr;7(2):257-60. doi: 10.1093/bioinformatics/7.2.257. Comput Appl Biosci. 1991. PMID: 2059852
-
Discovering sequence motifs.Methods Mol Biol. 2007;395:271-92. doi: 10.1007/978-1-59745-514-5_17. Methods Mol Biol. 2007. PMID: 17993680 Review.
Cited by
-
NEXT-peak: a normal-exponential two-peak model for peak-calling in ChIP-seq data.BMC Genomics. 2013 May 25;14:349. doi: 10.1186/1471-2164-14-349. BMC Genomics. 2013. PMID: 23706083 Free PMC article.
-
MONKEY: identifying conserved transcription-factor binding sites in multiple alignments using a binding site-specific evolutionary model.Genome Biol. 2004;5(12):R98. doi: 10.1186/gb-2004-5-12-r98. Epub 2004 Nov 30. Genome Biol. 2004. PMID: 15575972 Free PMC article.
-
Compound poisson approximation of the number of occurrences of a position frequency matrix (PFM) on both strands.J Comput Biol. 2008 Jul-Aug;15(6):547-64. doi: 10.1089/cmb.2007.0084. J Comput Biol. 2008. PMID: 18631020 Free PMC article.
-
The Staden sequence analysis package.Mol Biotechnol. 1996 Jun;5(3):233-41. doi: 10.1007/BF02900361. Mol Biotechnol. 1996. PMID: 8837029 Review.
-
In silico identification and experimental validation of PmrAB _targets in Salmonella typhimurium by regulatory motif detection.Genome Biol. 2004;5(2):R9. doi: 10.1186/gb-2004-5-2-r9. Epub 2004 Jan 29. Genome Biol. 2004. PMID: 14759259 Free PMC article.