Towards a Definitive Measure of Repetitiveness

22Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Unlike in statistical compression, where Shannon’s entropy is a definitive lower bound, no such clear measure exists for the compressibility of repetitive sequences. Since statistical entropy does not capture repetitiveness, ad-hoc measures like the size z of the Lempel–Ziv parse are frequently used to estimate repetitiveness. Recently, a more principled measure, the size γ of the smallest string attractor, was introduced. The measure γ lower bounds all the previous relevant ones (including z), yet length-n strings can be represented and efficiently indexed within space O(γlognγ), which also upper bounds most measures (including z). While γ is certainly a better measure of repetitiveness than z, it is NP-complete to compute, and no o(γlog n) -space representation of strings is known. In this paper, we study a smaller measure, δ≤ γ, which can be computed in linear time. We show that δ better captures the compressibility of repetitive strings. For every length n and every value δ≥ 2, we construct a string such that γ=Ω(δlognδ). Still, we show a representation of any string S in O(δlognδ) space that supports direct access to any character S[i] in time O(lognδ) and finds the occ occurrences of any pattern P[1.m] in time O(mlog n+ occlogεn) for any constant ε> 0. Further, we prove that no o(δlog n) -space representation exists: for every length n and every value 2 ≤ δ≤ n1-ε, we exhibit a string family whose elements can only be encoded in Ω(δlognδ) space. We complete our characterization of δ by showing that, although γ, z, and other repetitiveness measures are always O(δlognδ), for strings of any length n, the smallest context-free grammar can be of size Ω(δlog2n/ log log n). No such separation is known for γ.

Cite

CITATION STYLE

APA

Kociumaka, T., Navarro, G., & Prezza, N. (2020). Towards a Definitive Measure of Repetitiveness. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12118 LNCS, pp. 207–219). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61792-9_17

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free