Compressing dynamic text collections via phrase-based coding

4Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present a new statistical compression method, which we call Phrase Based Dense Code (PBDC), aimed at compressing large digital libraries. PBDC compresses the text collection to 30-32% of its original size, permits maintaining the text compressed all the time, and offers efficient on-line information retrieval services. The novelty of PBDC is that it supports continuous growing of the compressed text collection, by automatically adapting the vocabulary both to new words and to changes in the word frequency distribution, without degrading the compression ratio. Text compressed with PBDC can be searched directly without decompression, using fast Boyer-Moore algorithms. It is also possible to decompress arbitrary portions of the collection. Alternative compression methods oriented to information retrieval focus on static collections and thus are less well suited to digital libraries. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Brisaboa, N. R., Fariña, A., Navarro, G., & Paramá, J. R. (2005). Compressing dynamic text collections via phrase-based coding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3652 LNCS, pp. 462–474). https://doi.org/10.1007/11551362_41

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free