Computing q-gram non-overlapping frequencies on SLP compressed texts

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Length-q substrings, or q-grams, can represent important characteristics of text data, and determining the frequencies of all q-grams contained in the data is an important problem with many applications in the field of data mining and machine learning. In this paper, we consider the problem of calculating the non-overlapping frequencies of all q-grams in a text given in compressed form, namely, as a straight line program (SLP). We show that the problem can be solved in O(q 2 n) time and O(qn) space where n is the size of the SLP. This generalizes and greatly improves previous work (Inenaga & Bannai, 2009) which solved the problem only for q = 2 in O(n 4logn) time and O(n 3) space. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Goto, K., Bannai, H., Inenaga, S., & Takeda, M. (2012). Computing q-gram non-overlapping frequencies on SLP compressed texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7147 LNCS, pp. 301–312). https://doi.org/10.1007/978-3-642-27660-6_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free