Using protein domains to improve the accuracy of Ab Initio gene finding

0Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Background: Protein domains are the common functional elements used by nature to generate tremendous diversity among proteins, and they are used repeatedly in different combinations across all major domains of life. In this paper we address the problem of using similarity to known protein domains in helping with the identification of genes in a DNA sequence. We have adapted the generalized hidden Markov model (GHMM) architecture of the ab intio gene finder GlimmerHMM such that a higher probability is assigned to exons that contain homologues to protein domains. To our knowledge, this domain homology based approach has not been used previously in the context of ab initio gene prediction. Results: GlimmerHMM was augmented with a protein domain module that recognizes gene structures that are similar to Pfam models. The augmented system, GlimmerHMM+, shows 2% improvement in sensitivity and a 1% increase in specificity in predicting exact gene structures compared to GlimmerHMM without this option. These results were obtained on two very different model organisms: Arabidopsis thaliana (mustard wee) and Danio rerio (zebrafish), and together these preliminary results demonstrate the value of using protein domain homology in gene prediction. The results obtained are encouraging, and we believe that a more comprehensive approach including a model that reflects the statistical characteristics of specific sets of protein domain families would result in a greater increase of the accuracy of gene prediction. GlimmerHMM and GlimmerHMM+ are freely available as open source software at http://cbcb.umd.edu/software. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Pertea, M., & Salzberg, S. L. (2007). Using protein domains to improve the accuracy of Ab Initio gene finding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4645 LNBI, pp. 208–215). Springer Verlag. https://doi.org/10.1007/978-3-540-74126-8_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free