Current web search engines return query-biased snippets for each document they list in a result set. For efficiency, search engines operating on large collections need to cache snippets for common queries, and to cache documents to allow fast generation of snippets for uncached queries. To improve the hit rate on a document cache during snippet generation, we propose and evaluate several schemes for reducing document size, hence increasing the number of documents in the cache. In particular, we argue against further improvements to document compression, and argue for schemes that prune documents based on the a priori likelihood that a sentence will be used as part of a snippet for a given document. Our experiments show that if documents are reduced to less than half their original size, 80% of snippets generated are identical to those generated from the original documents. Moreover, as the pruned, compressed surrogates are smaller, 3-4 times as many documents can be cached. © Springer-Verlag Berlin Heidelberg 2009.
CITATION STYLE
Tsegay, Y., Puglisi, S. J., Turpin, A., & Zobel, J. (2009). Document compaction for efficient query biased snippet generation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5478 LNCS, pp. 530–537). Springer Verlag. https://doi.org/10.1007/978-3-642-00958-7_45
Mendeley helps you to discover research relevant for your work.