We introduce LZ-End, a new member of the Lempel-Ziv family of text compressors, which achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary text substrings. We then build the first self-index based on LZ77 (or LZ-End) compression, which in addition to text extraction offers fast indexed searches on the compressed text. This self-index is particularly effective for representing highly repetitive sequence collections, which arise for example when storing versioned documents, software repositories, periodic publications, and biological sequence databases. © 2012 Elsevier B.V. All rights reserved.
CITATION STYLE
Kreft, S., & Navarro, G. (2013). On compressing and indexing repetitive sequences. In Theoretical Computer Science (Vol. 483, pp. 115–133). https://doi.org/10.1016/j.tcs.2012.02.006
Mendeley helps you to discover research relevant for your work.