A new trend in the field of pattern matching is to design indexing data structures which take space very close to that required by the indexed text (in entropy-compressed form) and also simultaneously achieve good query performance. Two popular indexes, namely the FM-index [Ferragina and Manzini, 2005] and the CSA [Grossi and Vitter 2005], achieve this goal by exploiting the Burrows-Wheeler transform (BWT) [Burrows and Wheeler, 1994]. However, due to the intricate permutation structure of BWT, no locality of reference can be guaranteed when we perform pattern matching with these indexes. Chien et al. [2008] gave an alternative text index which is based on sparsifying the traditional suffix tree and maintaining an auxiliary 2-D range query structure. Given a text T of length n drawn from a σ-sized alphabet set, they achieved O(n logσ)-bit index for T and showed that this index can preserve locality in pattern matching and hence is amenable to be used in external-memory settings. We improve upon this index and show how to apply entropy compression to reduce index space. Our index takes O(n(H k + 1)) + o(nlogσ) bits of space where H k is the kth-order empirical entropy of the text. This is achieved by creating variable length blocks of text using arithmetic coding. © 2009 Springer.
CITATION STYLE
Hon, W. K., Shah, R., Thankachan, S. V., & Vitter, J. S. (2009). On entropy-compressed text indexing in external memory. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5721 LNCS, pp. 75–89). https://doi.org/10.1007/978-3-642-03784-9_8
Mendeley helps you to discover research relevant for your work.