Compressing and decoding term statistics time series

Jinfeng Rao; Xing Niu; Jimmy Lin

Conference Proceedings

Compressing and decoding term statistics time series

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9626 675-681

DOI: 10.1007/978-3-319-30671-1_52

3Citations

8Readers

Get full text

Abstract

There is growing recognition that temporality plays an important role in information retrieval, particularly for timestamped document collections such as tweets. This paper examines the problem of compressing and decoding term statistics time series, or counts of terms within a particular time window across a large document collection. Such data are large—essentially the cross product of the vocabulary and the number of time intervals—but are also sparse, which makes them amenable to compression.We explore various integer compression techniques, starting with a number of coding schemes that are well-known in the information retrieval literature, and build toward a novel compression approach based on Huffman codes over blocks of term counts. We show that our Huffman-based methods are able to substantially reduce storage requirements compared to state-of-the-art compression techniques while still maintaining good decoding performance.

Author supplied keywords

Cite

CITATION STYLE

APA

Rao, J., Niu, X., & Lin, J. (2016). Compressing and decoding term statistics time series. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9626, pp. 675–681). Springer Verlag. https://doi.org/10.1007/978-3-319-30671-1_52

Compressing and decoding term statistics time series

Abstract

Author supplied keywords

Cite

Register to see more suggestions