Compression of concordances in full-text retrieval systems

Yaacov Choueka; Aviezri S. Fraenkel; Shmuel T. Klein

Conference Proceedings

Compression of concordances in full-text retrieval systems

Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1988 (1988) 597-612

DOI: 10.1145/62437.62500

22Citations

6Readers

Get full text

Abstract

The concordance of a full-text information retrieval system contains for every different word W of the data base, a list L(W) of "coordinates", each of which describes the exact location of an occurrence of W in the text. The concordance should be compressed, not only for the savings in storage space, but also in order to reduce the number of I/O operations, since the file is usually kept in secondary memory. Several methods are presented, which efficiently compress concordances of large full-text retrieval systems. The methods were tested on the concordance of the Responsa Retrieval Project and yield savings of up to 49% relative to the non-compressed file; this is a relative improvement of about 27% over the currently used prefix-omission compression technique.

Cite

CITATION STYLE

APA

Choueka, Y., Fraenkel, A. S., & Klein, S. T. (1988). Compression of concordances in full-text retrieval systems. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1988 (pp. 597–612). Association for Computing Machinery, Inc. https://doi.org/10.1145/62437.62500

Compression of concordances in full-text retrieval systems

Abstract

Cite

Register to see more suggestions