Improved compressed string dictionaries

Nieves R. Brisaboa; Guillermo De Bernardo; Ana Cerdeira-Pena; Gonzalo Navarro

Conference ProceedingsOPEN ACCESS

Improved compressed string dictionaries

International Conference on Information and Knowledge Management, Proceedings (2019) 29-38

DOI: 10.1145/3357384.3357972

9Citations

22Readers

Get full text

Abstract

We introduce a new family of compressed data structures to efficiently store and query large string dictionaries in main memory. Our main technique is a combination of hierarchical Front-coding with ideas from longest-common-prefix computation in suffix arrays. Our data structures yield relevant space-time tradeoffs in real-world dictionaries. We focus on two domains where string dictionaries are extensively used and efficient compression is required: URL collections, a key element in Web graphs and applications such as Web mining; and collections of URIs and literals, the basic components of RDF datasets. Our experiments show that our data structures achieve better compression than the state-of-the-art alternatives while providing very competitive query times.

Author supplied keywords

Cite

CITATION STYLE

APA

Brisaboa, N. R., De Bernardo, G., Cerdeira-Pena, A., & Navarro, G. (2019). Improved compressed string dictionaries. In International Conference on Information and Knowledge Management, Proceedings (pp. 29–38). Association for Computing Machinery. https://doi.org/10.1145/3357384.3357972

Improved compressed string dictionaries

Abstract

Author supplied keywords

Cite

Register to see more suggestions