Generalized similarity kernels for efficient sequence classification

26Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

String kernel-based machine learning methods have yielded great success in practical tasks of struc- Tured/sequential data analysis. They often exhibit state-of-the-art performance on tasks such as docu- ment topic elucidation, music genre classification, pro- Tein superfamily and fold prediction. However, typi- cal string kernel methods rely on symbolic Hamming- distance based matching which may not necessarily reect the underlying (e.g., physical) similarity between sequence fragments. In this work we propose a novel computational framework that uses general similarity metrics S(·; ·) and distance-preserving embeddings with string kernels to improve sequence classification. In par- Ticular, we consider two approaches that allow one ei- Ther to incorporate non-Hamming similarity S(·;·) into similarity evaluation by matching only the features that are similar according to S(·; ·) or to retain actual (ap- proximate) similarity/distance scores in similarity eval- uation. An embedding step, a distance-preserving bit- string mapping, is used to effectively capture similarity between otherwise symbolically different sequence ele- ments. We show that it is possible to retain computa- Tional efficiency of string kernels while using this more "precise" measure of similarity. We then demonstrate that on a number of sequence classification tasks such as music, and biological sequence classification, the new method can substantially improve upon state-of-the-art string kernel baselines. Copyright © 2012 by the Society for Industrial and Applied Mathematics.

Cite

CITATION STYLE

APA

Kuksa, P. P., Khan, I., & Pavlovic, V. (2012). Generalized similarity kernels for efficient sequence classification. In Proceedings of the 12th SIAM International Conference on Data Mining, SDM 2012 (pp. 873–882). https://doi.org/10.1137/1.9781611972825.75

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free