Representations for large-scale sequence data mining: A tale of two vector space models

Vijay V. Raghavan; Ryan G. Benton; Tom Johnsten; Ying Xie

Conference Proceedings

Representations for large-scale sequence data mining: A tale of two vector space models

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 8170 LNAI 15-25

DOI: 10.1007/978-3-642-41218-9_3

2Citations

1Readers

Get full text

Abstract

Analyzing and classifying sequence data based on structural similarities and differences is a mathematical problem of escalating relevance. Indeed, a primary challenge in designing machine learning algorithms to analyzing sequence data is the extraction and representation of significant features. This paper introduces a generalized sequence feature extraction model, referred to as the Generalized Multi-Layered Vector Spaces (GMLVS) model. Unlike most models that represent sequence data based on subsequences frequency, the GMLVS model represents a given sequence as a collection of features, where each individual feature captures the spatial relationships between two subsequences and can be mapped into a feature vector. The utility of this approach is demonstrated via two special cases of the GMLVS model, namely, Lossless Decomposition (LD) and the Multi-Layered Vector Spaces (MLVS). Experimental evaluation show the GMLVS inspired models generated feature vectors that, combined with basic machine learning techniques, are able to achieve high classification performance. © 2013 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Raghavan, V. V., Benton, R. G., Johnsten, T., & Xie, Y. (2013). Representations for large-scale sequence data mining: A tale of two vector space models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8170 LNAI, pp. 15–25). https://doi.org/10.1007/978-3-642-41218-9_3

Representations for large-scale sequence data mining: A tale of two vector space models

Abstract

Author supplied keywords

Cite

Register to see more suggestions