We present a new statistical compression method, which we call Phrase Based Dense Code (PBDC), aimed at compressing large digital libraries. PBDC compresses the text collection to 30-32% of its original size, permits maintaining the text compressed all the time, and offers efficient on-line information retrieval services. The novelty of PBDC is that it supports continuous growing of the compressed text collection, by automatically adapting the vocabulary both to new words and to changes in the word frequency distribution, without degrading the compression ratio. Text compressed with PBDC can be searched directly without decompression, using fast Boyer-Moore algorithms. It is also possible to decompress arbitrary portions of the collection. Alternative compression methods oriented to information retrieval focus on static collections and thus are less well suited to digital libraries. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Brisaboa, N. R., Fariña, A., Navarro, G., & Paramá, J. R. (2005). Compressing dynamic text collections via phrase-based coding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3652 LNCS, pp. 462–474). https://doi.org/10.1007/11551362_41
Mendeley helps you to discover research relevant for your work.