Space-efficient algorithms for document retrieval

Niko Välimäki; Veli Mäkinen

Conference Proceedings

Space-efficient algorithms for document retrieval

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4580 LNCS 205-215

DOI: 10.1007/978-3-540-73437-6_22

60Citations

11Readers

Get full text

Abstract

We study the Document Listing problem, where a collection D of documents d1,..., dk of total length Σi d i = n is to be preprocessed, so that one can later efficiently list all the ndoc documents containing a given query pattern P of length m as a substring. Muthukrishnan (SODA 2002) gave an optimal solution to the problem; with O(n) time preprocessing, one can answer the queries in O(m + ndoc) time. In this paper, we improve the space-requirement of the Muthukrishnan's solution from O(n log n) bits to |CSA| + 2n + nlogk(1 + o(1)) bits, where |CSA| ≤ n log |Σ|(1 + o(1)) is the size of any suitable compressed suffix array (CSA), and Σ is the underlying alphabet of documents. The time requirement depends on the CSA used, but we can obtain e.g. the optimal O(m+ndoc) time when |Σ|, k = O(polylog(n)). For general |Σ|, k the time requirement becomes O(m lpg |Σ| + ndoc log k). Sadakane (ISAAC 2002) has developed a similar space-efficient variant of the Muthukrishnan's solution; we obtain a better time requirement in most cases, but a slightly worse space requirement. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Välimäki, N., & Mäkinen, V. (2007). Space-efficient algorithms for document retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4580 LNCS, pp. 205–215). Springer Verlag. https://doi.org/10.1007/978-3-540-73437-6_22

Space-efficient algorithms for document retrieval

Abstract

Cite

Register to see more suggestions