Document compaction for efficient query biased snippet generation

6Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Current web search engines return query-biased snippets for each document they list in a result set. For efficiency, search engines operating on large collections need to cache snippets for common queries, and to cache documents to allow fast generation of snippets for uncached queries. To improve the hit rate on a document cache during snippet generation, we propose and evaluate several schemes for reducing document size, hence increasing the number of documents in the cache. In particular, we argue against further improvements to document compression, and argue for schemes that prune documents based on the a priori likelihood that a sentence will be used as part of a snippet for a given document. Our experiments show that if documents are reduced to less than half their original size, 80% of snippets generated are identical to those generated from the original documents. Moreover, as the pruned, compressed surrogates are smaller, 3-4 times as many documents can be cached. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Tsegay, Y., Puglisi, S. J., Turpin, A., & Zobel, J. (2009). Document compaction for efficient query biased snippet generation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5478 LNCS, pp. 530–537). Springer Verlag. https://doi.org/10.1007/978-3-642-00958-7_45

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free