We have shown [8] that LZ78 parse length can be used effectively for a music classification task. The parse length is used to compute a normalized information distance [6,7] which is then used to drive a simple classifier. In this paper we explore a more subtle use of the LZ78 parsing algorithm. Instead of simply counting the parse length of a string, we use the coding dictionary constructed by LZ78 to derive a valid string kernel for a Support Vector Machine (SVM). The kernel is defined over a feature space indexed by all the phrases identified by our (modified) LZ78 compression algorithm. We report experiments with our kernel approach on two datasets: (i) a collection of MIDI files and (ii) Reuters-21578. We compare our technique with an n-gram based kernel. Our results indicate that the LZ78 kernel technique has a performance similar to that obtained with the best n-gram performance but with significantly lower computational overhead, and without requiring a search for the optimal value of n. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Li, M., & Sleep, R. (2005). An LZ78 based string kernel. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3584 LNAI, pp. 678–689). Springer Verlag. https://doi.org/10.1007/11527503_80
Mendeley helps you to discover research relevant for your work.