Term frequency quantization for compressing an inverted index

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we investigate the lossy compression of term frequencies in an inverted index based on quantization. Firstly, we examine the number of bits to code term frequencies with no or little degradation of retrieval performance. Both term-independent and term-specific quantizers are investigated. Next, an iterative technique is described for learning quantization step sizes. Experiments based on standard TREC test sets demonstrate that nearly no degradation of retrieval performance can be achieved by allocating only 2 or 3 bits for the quantized version of term frequencies. This is comparable to lossless coding techniques such as unary, γ and δ-codes. However, if lossless coding is applied to the quantized term frequency values, then around 26% (or 12%) savings can be achieved over lossless coding alone, with less than 2.5% (or no measurable) degradation in retrieval performance. © 2010 Springer-Verlag.

Cite

CITATION STYLE

APA

Zheng, L., & Cox, I. J. (2010). Term frequency quantization for compressing an inverted index. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6335 LNCS, pp. 277–287). https://doi.org/10.1007/978-3-642-15470-6_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free