The concordance of a full-text information retrieval system contains for every different word W of the data base, a list L(W) of "coordinates", each of which describes the exact location of an occurrence of W in the text. The concordance should be compressed, not only for the savings in storage space, but also in order to reduce the number of I/O operations, since the file is usually kept in secondary memory. Several methods are presented, which efficiently compress concordances of large full-text retrieval systems. The methods were tested on the concordance of the Responsa Retrieval Project and yield savings of up to 49% relative to the non-compressed file; this is a relative improvement of about 27% over the currently used prefix-omission compression technique.
CITATION STYLE
Choueka, Y., Fraenkel, A. S., & Klein, S. T. (1988). Compression of concordances in full-text retrieval systems. In Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1988 (pp. 597–612). Association for Computing Machinery, Inc. https://doi.org/10.1145/62437.62500
Mendeley helps you to discover research relevant for your work.