Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coll genomes using data mining

Ross D. King; Andreas Karwath; Amanda Clare; Luc Dehaspe

Journal ArticleOPEN ACCESS

Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coll genomes using data mining

Yeast (2000) 17(4) 283-293

DOI: 10.1002/1097-0061(200012)17:4<283::aid-yea52>3.0.co;2-f

55Citations

40Readers

Get full text

Abstract

The analysis of genomics data needs to become as automated as its generation. Here we present a novel data-mining approach to predicting protein functional class from sequence. This method is based on a combination of inductive logic programming clustering and rule learning. We demonstrate the effectiveness of this approach on the M. tuberculosis and E. coli genomes, and identify biologically interprétable rules which predict protein functional class from information only available from the sequence. These rules predict 65% of the ORFs with no assigned function in M. tuberculosis and 24% of those in E. coli, with an estimated accuracy of 60-80% (depending on the level of functional assignment). The rules are founded on a combination of detection of remote homology, convergent evolution and horizontal gene transfer. We identify rules that predict protein functional class even in the absence of detectable sequence or structural homology. These rules give insight into the evolutionary history of M. tuberculosis and E. coli. Copyright ©2000 John Wiley & Sons, Ltd.

Author supplied keywords

Cite

CITATION STYLE

APA

King, R. D., Karwath, A., Clare, A., & Dehaspe, L. (2000). Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coll genomes using data mining. Yeast, 17(4), 283–293. https://doi.org/10.1002/1097-0061(200012)17:4<283::aid-yea52>3.0.co;2-f

Accurate prediction of protein functional class from sequence in the Mycobacterium tuberculosis and Escherichia coll genomes using data mining

Abstract

Author supplied keywords

Cite

Register to see more suggestions