Improved compressed indexes for full-text document retrieval

Djamal Belazzougui; Gonzalo Navarro

Conference Proceedings

Improved compressed indexes for full-text document retrieval

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2011) 7024 LNCS 386-397

DOI: 10.1007/978-3-642-24583-1_38

9Citations

2Readers

Get full text

Abstract

We give new space/time tradeoffs for compressed indexes that answer document retrieval queries on general sequences. On a collection of D documents of total length n, current approaches require at least or |CSA| + O(n/lg D/lg lg D) or 2 |CSA| + o(n) bits of space, where CSA is a full-text index. Using monotone minimum perfect hash functions, we give new algorithms for document listing with frequencies and top-k document retrieval using just |CSA| + O(n lg lg lg D) bits. We also improve current solutions that use 2|CSA| + o(n) bits, and consider other problems such as colored range listing, top-k most important documents, and computing arbitrary frequencies. © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Belazzougui, D., & Navarro, G. (2011). Improved compressed indexes for full-text document retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7024 LNCS, pp. 386–397). https://doi.org/10.1007/978-3-642-24583-1_38

Improved compressed indexes for full-text document retrieval

Abstract

Cite

Register to see more suggestions