Representations for large-scale sequence data mining: A tale of two vector space models

2Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Analyzing and classifying sequence data based on structural similarities and differences is a mathematical problem of escalating relevance. Indeed, a primary challenge in designing machine learning algorithms to analyzing sequence data is the extraction and representation of significant features. This paper introduces a generalized sequence feature extraction model, referred to as the Generalized Multi-Layered Vector Spaces (GMLVS) model. Unlike most models that represent sequence data based on subsequences frequency, the GMLVS model represents a given sequence as a collection of features, where each individual feature captures the spatial relationships between two subsequences and can be mapped into a feature vector. The utility of this approach is demonstrated via two special cases of the GMLVS model, namely, Lossless Decomposition (LD) and the Multi-Layered Vector Spaces (MLVS). Experimental evaluation show the GMLVS inspired models generated feature vectors that, combined with basic machine learning techniques, are able to achieve high classification performance. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Raghavan, V. V., Benton, R. G., Johnsten, T., & Xie, Y. (2013). Representations for large-scale sequence data mining: A tale of two vector space models. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8170 LNAI, pp. 15–25). https://doi.org/10.1007/978-3-642-41218-9_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free