On searching compressed string collections cache-obliviously

Paolo Ferragina; Roberto Grossi; Ankur Gupta; Rahul Shah; Jeffrey Scott Vitter

Conference Proceedings

On searching compressed string collections cache-obliviously

Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (2008) 181-190

DOI: 10.1145/1376916.1376943

36Citations

23Readers

Get full text

Abstract

Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cache-friendly and compressed. We provide new insights on front coding [24], introduce other novel linearizations, and study how close their space occupancy is to the information-theoretic minimum. The moral is that they are not just heuristics. Our second contribution is a novel dictionary encoding scheme that builds upon such linearizations and achieves nearly optimal space, offers competitive I/O-search time, and is also conscious of the query distribution. Finally, we combine those data structures with cache-oblivious tries [2, 5] and obtain a succinct variant whose space is close to the information-theoretic minimum. Copyright 2008 ACM.

Author supplied keywords

Cite

CITATION STYLE

APA

Ferragina, P., Grossi, R., Gupta, A., Shah, R., & Vitter, J. S. (2008). On searching compressed string collections cache-obliviously. In Proceedings of the ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems (pp. 181–190). https://doi.org/10.1145/1376916.1376943

On searching compressed string collections cache-obliviously

Abstract

Author supplied keywords

Cite

Register to see more suggestions