This work is in the context of kernel-based learning algorithms for sequence data. We present a probabilistic approach to automatically extract, from the output of such string-kernel-based learning algorithms, the subsequences—or motifs—truly underlying the machine’s predictions. The proposed framework views motifs as free parameters in a probabilistic model, which is solved through a global optimization approach. In contrast to prevalent approaches, the proposed method can discover even difficult, long motifs, and could be combined with any kernel-based learning algorithm that is based on an adequate sequence kernel. We show that, by using a discriminate kernel machine such as a support vector machine, the approach can reveal discriminative motifs underlying the kernel predictor. We demonstrate the efficacy of our approach through a series of experiments on synthetic and real data, including problems from handwritten digit recognition and a large-scale human splice site data set from the domain of computational biology.
CITATION STYLE
Vidovic, M. M. C., Görnitz, N., Müller, K. R., Rätsch, G., & Kloft, M. (2015). Opening the black box: Revealing interpretable sequence motifs in kernel-based learning algorithms. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9285, pp. 137–153). Springer Verlag. https://doi.org/10.1007/978-3-319-23525-7_9
Mendeley helps you to discover research relevant for your work.