Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1990 Jan;87(1):118-22.
doi: 10.1073/pnas.87.1.118.

Automatic generation of primary sequence patterns from sets of related protein sequences

Affiliations

Automatic generation of primary sequence patterns from sets of related protein sequences

R F Smith et al. Proc Natl Acad Sci U S A. 1990 Jan.

Abstract

We have developed a computer algorithm that can extract the pattern of conserved primary sequence elements common to all members of a homologous protein family. The method involves clustering the pairwise similarity scores among a set of related sequences to generate a binary dendrogram (tree). The tree is then reduced in a stepwise manner by progressively replacing the node connecting the two most similar termini by one common pattern until only a single common "root" pattern remains. A pattern is generated at a node by (i) performing a local optimal alignment on the sequence/pattern pair connected by the node with the use of an extended dynamic programming algorithm and then (ii) constructing a single common pattern from this alignment with a nested hierarchy of amino acid classes to identify the minimal inclusive amino acid class covering each paired set of elements in the alignment. Gaps within an alignment are created and/or extended using a "pay once" gap penalty rule, and gapped positions are converted into gap characters that function as 0 or 1 amino acid of any type during subsequent alignment. This method has been used to generate a library of covering patterns for homologous families in the National Biomedical Research Foundation/Protein Identification Resource protein sequence data base. We show that a covering pattern can be more diagnostic for sequence family membership than any of the individual sequences used to construct the pattern.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Biochem Biophys Res Commun. 1966 Aug 12;24(3):346-52 - PubMed
    1. J Mol Biol. 1975 Nov 15;98(4):693-717 - PubMed
    1. J Mol Biol. 1981 Mar 25;147(1):195-7 - PubMed
    1. Proc Natl Acad Sci U S A. 1983 Feb;80(3):726-30 - PubMed
    1. Nucleic Acids Res. 1984 Jan 11;12(1 Pt 1):263-80 - PubMed

Publication types

LinkOut - more resources

  NODES
twitter 2