Fast kernel methods for SVM sequence classifiers

Pavel Kuksa; Vladimir Pavlovic

Conference Proceedings

Fast kernel methods for SVM sequence classifiers

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4645 LNBI 228-239

DOI: 10.1007/978-3-540-74126-8_22

9Citations

25Readers

Get full text

Abstract

In this work we study string kernel methods for sequence analysis and focus on the problem of species-level identification based on short DNA fragments known as barcodes. We introduce efficient sorting-based algorithms for exact string k-mer kernels and then describe a divide-and-conquer technique for kernels with mismatches. Our algorithms for mismatch kernel matrix computations improve currently known time bounds for these computations. We then consider the mismatch kernel problem with feature selection, and present efficient algorithms for it. Our experimental results show that, for string kernels with mismatches, kernel matrices can be computed 100-200 times faster than traditional approaches. Kernel vector evaluations on new sequences show similar computational improvements. On several DNA barcode datasets, k-mer string kernels considerably improve identification accuracy compared to prior results. String kernels with feature selection demonstrate competitive performance with substantially fewer computations. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Kuksa, P., & Pavlovic, V. (2007). Fast kernel methods for SVM sequence classifiers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4645 LNBI, pp. 228–239). Springer Verlag. https://doi.org/10.1007/978-3-540-74126-8_22

Fast kernel methods for SVM sequence classifiers

Abstract

Cite

Register to see more suggestions