Linear-size CDAWG: New repetition-aware indexing and grammar compression

11Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose a novel approach to combine compact directed acyclic word graphs (CDAWGs) and grammar-based compression. This leads us to an efficient self-index, called Linear-size CDAWGs (L-CDAWGs), which can be represented with O(ẽT log n) bits of space allowing for O(log n) -time random and O(1)-time sequential accesses to edge labels, and O(m log σ + occ) -time pattern matching. Here, ẽT is the number of all extensions of maximal repeats in T, n and m are respectively the lengths of the text T and a given pattern, σ is the alphabet size, and occ is the number of occurrences of the pattern in T. The repetitiveness measure ẽT is known to be much smaller than the text length n for highly repetitive text. For constant alphabets, our L-CDAWGs achieve O(m + occ ) pattern matching time with O(eTr log n) bits of space, which improves the pattern matching time of Belazzougui et al.’s run-length BWT-CDAWGs by a factor of log log n, with the same space complexity. Here, eTr is the number of right extensions of maximal repeats in T. As a byproduct, our result gives a way of constructing a straight-line program (SLP) of size O(ẽT) for a given text T in O(n + ẽT log σ) time.

Cite

CITATION STYLE

APA

Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., & Arimura, H. (2017). Linear-size CDAWG: New repetition-aware indexing and grammar compression. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10508 LNCS, pp. 304–316). Springer Verlag. https://doi.org/10.1007/978-3-319-67428-5_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free