Automatic generation of primary sequence patterns from sets of related protein sequences

doi:10.1073/pnas.87.1.118

. 1990 Jan;87(1):118-22.

doi: 10.1073/pnas.87.1.118.

Automatic generation of primary sequence patterns from sets of related protein sequences

R F Smith¹, T F Smith

Affiliations

PMID: 2296575
PMCID: PMC53211
DOI: 10.1073/pnas.87.1.118

Automatic generation of primary sequence patterns from sets of related protein sequences

R F Smith et al. Proc Natl Acad Sci U S A. 1990 Jan.

. 1990 Jan;87(1):118-22.

doi: 10.1073/pnas.87.1.118.

Authors

R F Smith¹, T F Smith

Affiliation

¹ Department of Biostatistics, Dana-Farber Cancer Institute, Boston, MA 02115.

PMID: 2296575
PMCID: PMC53211
DOI: 10.1073/pnas.87.1.118

Abstract

We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner by progressively replacing the node connecting the two most similar termini by one common pattern until only a single common "root" pattern remains. A pattern is generated at a node by (i) performing a local optimal alignment on the sequence/pattern pair connected by the node with the use of an extended dynamic programming algorithm and then (ii) constructing a single common pattern from this alignment with a nested hierarchy of amino acid classes to identify the minimal inclusive amino acid class covering each paired set of elements in the alignment. Gaps within an alignment are created and/or extended using a "pay once" gap penalty rule, and gapped positions are converted into gap characters that function as 0 or 1 amino acid of any type during subsequent alignment. This method has been used to generate a library of covering patterns for homologous families in the National Biomedical Research Foundation/Protein Identification Resource protein sequence data base. We show that a covering pattern can be more diagnostic for sequence family membership than any of the individual sequences used to construct the pattern.

PubMed Disclaimer

Cited by

Protein database searches for multiple alignments.
Altschul SF, Lipman DJ. Altschul SF, et al. Proc Natl Acad Sci U S A. 1990 Jul;87(14):5509-13. doi: 10.1073/pnas.87.14.5509. Proc Natl Acad Sci U S A. 1990. PMID: 2196570 Free PMC article.
Structure and function of tyrosine kinase receptors.
White MF. White MF. J Bioenerg Biomembr. 1991 Feb;23(1):63-82. doi: 10.1007/BF00768839. J Bioenerg Biomembr. 1991. PMID: 1849136 Review.
An Eulerian path approach to local multiple alignment for DNA sequences.
Zhang Y, Waterman MS. Zhang Y, et al. Proc Natl Acad Sci U S A. 2005 Feb 1;102(5):1285-90. doi: 10.1073/pnas.0409240102. Epub 2005 Jan 24. Proc Natl Acad Sci U S A. 2005. PMID: 15668398 Free PMC article.
Searching databases of conserved sequence regions by aligning protein multiple-alignments.
Pietrokovski S. Pietrokovski S. Nucleic Acids Res. 1996 Oct 1;24(19):3836-45. doi: 10.1093/nar/24.19.3836. Nucleic Acids Res. 1996. PMID: 8871566 Free PMC article.
Automated assembly of protein blocks for database searching.
Henikoff S, Henikoff JG. Henikoff S, et al. Nucleic Acids Res. 1991 Dec 11;19(23):6565-72. doi: 10.1093/nar/19.23.6565. Nucleic Acids Res. 1991. PMID: 1754394 Free PMC article.

See all "Cited by" articles

References

1. Biochem Biophys Res Commun. 1966 Aug 12;24(3):346-52 - PubMed
1. J Mol Biol. 1975 Nov 15;98(4):693-717 - PubMed
1. J Mol Biol. 1981 Mar 25;147(1):195-7 - PubMed
1. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30 - PubMed
1. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263-80 - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

RR02275/RR/NCRR NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

[1] Biochem Biophys Res Commun. 1966 Aug 12;24(3):346-52 - PubMed

[2] Biochem Biophys Res Commun. 1966 Aug 12;24(3):346-52 - PubMed

[3] J Mol Biol. 1975 Nov 15;98(4):693-717 - PubMed

[4] J Mol Biol. 1975 Nov 15;98(4):693-717 - PubMed

[5] J Mol Biol. 1981 Mar 25;147(1):195-7 - PubMed

[6] J Mol Biol. 1981 Mar 25;147(1):195-7 - PubMed

[7] Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30 - PubMed

[8] Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30 - PubMed

[9] Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263-80 - PubMed

[10] Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263-80 - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automatic generation of primary sequence patterns from sets of related protein sequences

Affiliation

Automatic generation of primary sequence patterns from sets of related protein sequences

Authors

Affiliation

Abstract

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous

Abstract

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous