WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment

doi:10.1089/cmb.2021.0585

. 2022 Aug;29(8):782-801.

doi: 10.1089/cmb.2021.0585. Epub 2022 May 17.

WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment

Chengze Shen¹, Minhyuk Park¹, Tandy Warnow¹

Affiliations

PMID: 35575747
DOI: 10.1089/cmb.2021.0585

WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment

Chengze Shen et al. J Comput Biol. 2022 Aug.

. 2022 Aug;29(8):782-801.

doi: 10.1089/cmb.2021.0585. Epub 2022 May 17.

Authors

Chengze Shen¹, Minhyuk Park¹, Tandy Warnow¹

Affiliation

¹ Department of Computer Science, University of Illinois Urbana-Champaign, Urbana, Illinois, USA.

PMID: 35575747
DOI: 10.1089/cmb.2021.0585

Abstract

Accurate multiple sequence alignment is challenging on many data sets, including those that are large, evolve under high rates of evolution, or have sequence length heterogeneity. While substantial progress has been made over the last decade in addressing the first two challenges, sequence length heterogeneity remains a significant issue for many data sets. Sequence length heterogeneity occurs for biological and technological reasons, including large insertions or deletions (indels) that occurred in the evolutionary history relating the sequences, or the inclusion of sequences that are not fully assembled. Ultra-large alignments using Phylogeny-Aware Profiles (UPP) (Nguyen et al. 2015) is one of the most accurate approaches for aligning data sets that exhibit sequence length heterogeneity: it constructs an alignment on the subset of sequences it considers "full-length," represents this "backbone alignment" using an ensemble of hidden Markov models (HMMs), and then adds each remaining sequence into the backbone alignment based on an HMM selected for that sequence from the ensemble. Our new method, WeIghTed Consensus Hmm alignment (WITCH), improves on UPP in three important ways: first, it uses a statistically principled technique to weight and rank the HMMs; second, it uses $k > 1$ HMMs from the ensemble rather than a single HMM; and third, it combines the alignments for each of the selected HMMs using a consensus algorithm that takes the weights into account. We show that this approach provides improved alignment accuracy compared with UPP and other leading alignment methods, as well as improved accuracy for maximum likelihood trees based on these alignments.

Keywords: divide and conquer; hidden Markov model; multiple sequence alignment.

PubMed Disclaimer

Cited by

EMMA: a new method for computing multiple sequence alignments given a constraint subset alignment.
Shen C, Liu B, Williams KP, Warnow T. Shen C, et al. Algorithms Mol Biol. 2023 Dec 7;18(1):21. doi: 10.1186/s13015-023-00247-x. Algorithms Mol Biol. 2023. PMID: 38062452 Free PMC article.
HMMerge: an ensemble method for multiple sequence alignment.
Park M, Warnow T. Park M, et al. Bioinform Adv. 2023 Apr 17;3(1):vbad052. doi: 10.1093/bioadv/vbad052. eCollection 2023. Bioinform Adv. 2023. PMID: 37128578 Free PMC article.
UPP2: fast and accurate alignment of datasets with fragmentary sequences.
Park M, Ivanovic S, Chu G, Shen C, Warnow T. Park M, et al. Bioinformatics. 2023 Jan 1;39(1):btad007. doi: 10.1093/bioinformatics/btad007. Bioinformatics. 2023. PMID: 36625535 Free PMC article.
WITCH-NG: efficient and accurate alignment of datasets with sequence length heterogeneity.
Liu B, Warnow T. Liu B, et al. Bioinform Adv. 2023 Mar 6;3(1):vbad024. doi: 10.1093/bioadv/vbad024. eCollection 2023. Bioinform Adv. 2023. PMID: 36970502 Free PMC article.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Atypon

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment

Affiliation

WITCH: Improved Multiple Sequence Alignment Through Weighted Consensus Hidden Markov Model Alignment

Authors

Affiliation

Abstract

Similar articles

Cited by

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Similar articles

Cited by

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources