Improved sensitivity of profile searches through the use of sequence weights and gap excision
- PMID: 8193951
- DOI: 10.1093/bioinformatics/10.1.19
Improved sensitivity of profile searches through the use of sequence weights and gap excision
Abstract
Position-specific substitution matrices, known as profiles, derived from multiple sequence alignments are currently used to search sequence databases for distantly related members of protein families. The performance of the database searches is enhanced by using (i) a sequence weighting scheme which assigns higher weights to more distantly related sequences based on branch lengths derived from phylogenetic trees, (ii) exclusion of positions with mainly padding characters at sites of insertions or deletions and (iii) the BLOSUM62 residue comparison matrix. A natural consequence of these modifications is an improvement in the alignment of new sequences to the profiles. However, the accuracy of the alignments can be further increased by employing a similarity residue comparison matrix. These developments are implemented in a program called PROFILEWEIGHT which runs on Unix and Vax computers. The only input required by the program is the multiple sequence alignment. The output from PROFILEWEIGHT is a profile designed to be used by existing searching and alignment programs. Test results from database searches with four different families of proteins show the improved sensitivity of the weighted profiles.
Similar articles
-
Using CLUSTAL for multiple sequence alignments.Methods Enzymol. 1996;266:383-402. doi: 10.1016/s0076-6879(96)66024-8. Methods Enzymol. 1996. PMID: 8743695
-
Searching for distantly related protein sequences in large databases by parallel processing on a transputer machine.Comput Appl Biosci. 1992 Feb;8(1):49-55. doi: 10.1093/bioinformatics/8.1.49. Comput Appl Biosci. 1992. PMID: 1568125
-
SALSA: improved protein database searching by a new algorithm for assembly of sequence fragments into gapped alignments.Bioinformatics. 1998;14(10):839-45. doi: 10.1093/bioinformatics/14.10.839. Bioinformatics. 1998. PMID: 9927712
-
Protein database searches using compositionally adjusted substitution matrices.FEBS J. 2005 Oct;272(20):5101-9. doi: 10.1111/j.1742-4658.2005.04945.x. FEBS J. 2005. PMID: 16218944 Free PMC article. Review.
-
Identifying distantly related protein sequences.Comput Appl Biosci. 1997 Aug;13(4):325-32. doi: 10.1093/bioinformatics/13.4.325. Comput Appl Biosci. 1997. PMID: 9283747 Review. No abstract available.
Cited by
-
The construction and use of log-odds substitution scores for multiple sequence alignment.PLoS Comput Biol. 2010 Jul 15;6(7):e1000852. doi: 10.1371/journal.pcbi.1000852. PLoS Comput Biol. 2010. PMID: 20657661 Free PMC article.
-
Organization of the biosynthetic gene cluster for the polyketide anthelmintic macrolide avermectin in Streptomyces avermitilis.Proc Natl Acad Sci U S A. 1999 Aug 17;96(17):9509-14. doi: 10.1073/pnas.96.17.9509. Proc Natl Acad Sci U S A. 1999. PMID: 10449723 Free PMC article.
-
MUSCLE: a multiple sequence alignment method with reduced time and space complexity.BMC Bioinformatics. 2004 Aug 19;5:113. doi: 10.1186/1471-2105-5-113. BMC Bioinformatics. 2004. PMID: 15318951 Free PMC article.
-
Bacterial alpha2-macroglobulins: colonization factors acquired by horizontal gene transfer from the metazoan genome?Genome Biol. 2004;5(6):R38. doi: 10.1186/gb-2004-5-6-r38. Epub 2004 May 26. Genome Biol. 2004. PMID: 15186489 Free PMC article.
-
Novel druggable hot spots in avian influenza neuraminidase H5N1 revealed by computational solvent mapping of a reduced and representative receptor ensemble.Chem Biol Drug Des. 2008 Feb;71(2):106-16. doi: 10.1111/j.1747-0285.2007.00614.x. Epub 2008 Jan 17. Chem Biol Drug Des. 2008. PMID: 18205727 Free PMC article.
MeSH terms
Substances
LinkOut - more resources
Other Literature Sources
Miscellaneous