Linear-size CDAWG: New repetition-aware indexing and grammar compression

Takuya Takagi; Keisuke Goto; Yuta Fujishige; Shunsuke Inenaga; Hiroki Arimura

Conference Proceedings

Linear-size CDAWG: New repetition-aware indexing and grammar compression

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10508 LNCS 304-316

DOI: 10.1007/978-3-319-67428-5_26

11Citations

11Readers

Get full text

Abstract

In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with O(ẽT log n) bits of space allowing for O(log n) -time random and O(1)-time sequential accesses to edge labels, and O(m log σ + occ) -time pattern matching. Here, ẽT is the number of all extensions of maximal repeats in T, n and m are respectively the lengths of the text T and a given pattern, σ is the alphabet size, and occ is the number of occurrences of the pattern in T. The repetitiveness measure ẽT is known to be much smaller than the text length n for highly repetitive text. For constant alphabets, our L-CDAWGs achieve O(m + occ ) pattern matching time with O(eTr log n) bits of space, which improves the pattern matching time of Belazzougui et al.’s run-length BWT-CDAWGs by a factor of log log n, with the same space complexity. Here, eTr is the number of right extensions of maximal repeats in T. As a byproduct, our result gives a way of constructing a straight-line program (SLP) of size O(ẽT) for a given text T in O(n + ẽT log σ) time.

Cite

CITATION STYLE

APA

Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., & Arimura, H. (2017). Linear-size CDAWG: New repetition-aware indexing and grammar compression. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10508 LNCS, pp. 304–316). Springer Verlag. https://doi.org/10.1007/978-3-319-67428-5_26

Linear-size CDAWG: New repetition-aware indexing and grammar compression

Abstract

Cite

Register to see more suggestions