Document compaction for efficient query biased snippet generation

Yohannes Tsegay; Simon J. Puglisi; Andrew Turpin; Justin Zobel

Conference Proceedings

Document compaction for efficient query biased snippet generation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2009) 5478 LNCS 530-537

DOI: 10.1007/978-3-642-00958-7_45

6Citations

14Readers

Get full text

Abstract

Current web search engines return query-biased snippets for each document they list in a result set. For efficiency, search engines operating on large collections need to cache snippets for common queries, and to cache documents to allow fast generation of snippets for uncached queries. To improve the hit rate on a document cache during snippet generation, we propose and evaluate several schemes for reducing document size, hence increasing the number of documents in the cache. In particular, we argue against further improvements to document compression, and argue for schemes that prune documents based on the a priori likelihood that a sentence will be used as part of a snippet for a given document. Our experiments show that if documents are reduced to less than half their original size, 80% of snippets generated are identical to those generated from the original documents. Moreover, as the pruned, compressed surrogates are smaller, 3-4 times as many documents can be cached. © Springer-Verlag Berlin Heidelberg 2009.

Cite

CITATION STYLE

APA

Tsegay, Y., Puglisi, S. J., Turpin, A., & Zobel, J. (2009). Document compaction for efficient query biased snippet generation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5478 LNCS, pp. 530–537). Springer Verlag. https://doi.org/10.1007/978-3-642-00958-7_45

Document compaction for efficient query biased snippet generation

Abstract

Cite

Register to see more suggestions