Speeding up q-gram mining on grammar-based compressed texts

Keisuke Goto; Hideo Bannai; Shunsuke Inenaga; Masayuki Takeda

Conference Proceedings

Speeding up q-gram mining on grammar-based compressed texts

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7354 LNCS 220-231

DOI: 10.1007/978-3-642-31265-6_18

N/ACitations

10Readers

Get full text

Abstract

We present an efficient algorithm for calculating q-gram frequencies on strings represented in compressed form, namely, as a straight line program (SLP). Given an SLP of size n that represents string T, the algorithm computes the occurrence frequencies of all q-grams in T, by reducing the problem to the weighted q-gram frequencies problem on a trie-like structure of size , where is a quantity that represents the amount of redundancy that the SLP captures with respect to q-grams. The reduced problem can be solved in linear time. Since m = O(qn), the running time of our algorithm is , improving our previous O(qn) algorithm when q = Ω(|T|/n). © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Goto, K., Bannai, H., Inenaga, S., & Takeda, M. (2012). Speeding up q-gram mining on grammar-based compressed texts. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7354 LNCS, pp. 220–231). https://doi.org/10.1007/978-3-642-31265-6_18

Speeding up q-gram mining on grammar-based compressed texts

Abstract

Cite

Register to see more suggestions