Space-efficient algorithms for document retrieval

60Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We study the Document Listing problem, where a collection D of documents d1,..., dk of total length Σi d i = n is to be preprocessed, so that one can later efficiently list all the ndoc documents containing a given query pattern P of length m as a substring. Muthukrishnan (SODA 2002) gave an optimal solution to the problem; with O(n) time preprocessing, one can answer the queries in O(m + ndoc) time. In this paper, we improve the space-requirement of the Muthukrishnan's solution from O(n log n) bits to |CSA| + 2n + nlogk(1 + o(1)) bits, where |CSA| ≤ n log |Σ|(1 + o(1)) is the size of any suitable compressed suffix array (CSA), and Σ is the underlying alphabet of documents. The time requirement depends on the CSA used, but we can obtain e.g. the optimal O(m+ndoc) time when |Σ|, k = O(polylog(n)). For general |Σ|, k the time requirement becomes O(m lpg |Σ| + ndoc log k). Sadakane (ISAAC 2002) has developed a similar space-efficient variant of the Muthukrishnan's solution; we obtain a better time requirement in most cases, but a slightly worse space requirement. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Välimäki, N., & Mäkinen, V. (2007). Space-efficient algorithms for document retrieval. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4580 LNCS, pp. 205–215). Springer Verlag. https://doi.org/10.1007/978-3-540-73437-6_22

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free