HMM-FRAME: Accurate protein domain classification for metagenomic sequences containing frameshift errors

52Citations
Citations of this article
111Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Protein domain classification is an important step in metagenomic annotation. The state-of-the-art method for protein domain classification is profile HMM-based alignment. However, the relatively high rates of insertions and deletions in homopolymer regions of pyrosequencing reads create frameshifts, causing conventional profile HMM alignment tools to generate alignments with marginal scores. This makes error-containing gene fragments unclassifiable with conventional tools. Thus, there is a need for an accurate domain classification tool that can detect and correct sequencing errors.Results: We introduce HMM-FRAME, a protein domain classification tool based on an augmented Viterbi algorithm that can incorporate error models from different sequencing platforms. HMM-FRAME corrects sequencing errors and classifies putative gene fragments into domain families. It achieved high error detection sensitivity and specificity in a data set with annotated errors. We applied HMM-FRAME in Targeted Metagenomics and a published metagenomic data set. The results showed that our tool can correct frameshifts in error-containing sequences, generate much longer alignments with significantly smaller E-values, and classify more sequences into their native families.Conclusions: HMM-FRAME provides a complementary protein domain classification tool to conventional profile HMM-based methods for data sets containing frameshifts. Its current implementation is best used for small-scale metagenomic data sets. The source code of HMM-FRAME can be downloaded at http://www.cse.msu.edu/~zhangy72/hmmframe/ and at https://sourceforge.net/projects/hmm-frame/. © 2011 Zhang and Sun; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Zhang, Y., & Sun, Y. (2011). HMM-FRAME: Accurate protein domain classification for metagenomic sequences containing frameshift errors. BMC Bioinformatics, 12. https://doi.org/10.1186/1471-2105-12-198

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free