A Hierarchical n-grams extraction approach for classification problem

2Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We are interested in protein classification based on their primary structures. The goal is to automatically classify proteins sequences according to their families. This task goes through the extraction of a set of descriptors that we present to the supervised learning algorithms. There are many types of descriptors used in the literature. The most popular one is the n-gram. It corresponds to a series of characters of n-length. The standard approach of the n-grams consists in setting first the parameter n, extracting the corresponding ngrams descriptors, and in working with this value during the whole data mining process. In this paper, we propose an hierarchical approach to the n-grams construction. The goal is to obtain descriptors of varying length for a better characterization of the protein families. This approach tries to answer to the domain knowledge of the biologists. The patterns, which characterize the proteins' family, have most of the time a various length. Our idea is to transpose the freq ent itemsets extraction principle, mainly used for the association rule mining, in the n-grams extraction for protein classification context. The experimentation shows that the new approach is consistent with the biological reality and has the same accuracy of the standard approach. ©Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Mhamdi, F., Rakotomalala, R., & Elloumi, M. (2009). A Hierarchical n-grams extraction approach for classification problem. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4879 LNCS, pp. 211–222). https://doi.org/10.1007/978-3-642-01350-8_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free