Indexing shared content in information retrieval systems

23Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Modern document collections often contain groups of documents with overlapping or shared content. However, most information retrieval systems process each document separately, causing shared content to be indexed multiple times. In this paper, we describe a new document representation model where related documents are organized as a tree, allowing shared content to be indexed just once. We show how this representation model can be encoded in an inverted index and we describe algorithms for evaluating free-text queries based on this encoding. We also show how our representation model applies to web, email, and newsgroup search. Finally, we present experimental results showing that our methods can provide a significant reduction in the size of an inverted index as well as in the time to build and query it. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Broder, A. Z., Eiron, N., Fontoura, M., Herscovici, M., Lempel, R., McPherson, J., … Shekita, E. (2006). Indexing shared content in information retrieval systems. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3896 LNCS, pp. 313–330). https://doi.org/10.1007/11687238_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free