A document-centric approach to static index pruning in text retrieval systems

62Citations
Citations of this article
54Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a document-centric approach to decide whether a posting for a given term should remain in the index or not. The decision is made based on the term's contribution to the document's Kullback-Leibler divergence from the text collection's global language model. Our technique can be used to decrease the size of the index by over 90%, at only a minor decrease in retrieval effectiveness. It thus allows us to make the index small enough to fit entirely into the main memory of a single PC, even for large text collections containing millions of documents. This results in great efficiency gains, superior to those of earlier pruning methods, and an average response time around 20 ms on the GOV2 document collection. Copyright 2006 ACM.

Cite

CITATION STYLE

APA

Büttcher, S., & Clarke, C. L. A. (2006). A document-centric approach to static index pruning in text retrieval systems. In International Conference on Information and Knowledge Management, Proceedings (pp. 182–189). https://doi.org/10.1145/1183614.1183644

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free