The use of phrases as part of similarity computations can enhance search effectiveness. But the gain comes at a cost, either in terms of index size, if all word-tuples are treated as queryable objects; or in terms of processing time, if postings lists for phrases are constructed at query time. There is also a lack of clarity as to which phrases are “interesting”, in the sense of capturing useful information. Here we explore several techniques for recognizing phrases using statistics of large-scale collections, and evaluate their quality.
CITATION STYLE
Gog, S., Moffat, A., & Petri, M. (2015). On identifying phrases using collection statistics. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9022, pp. 278–283). Springer Verlag. https://doi.org/10.1007/978-3-319-16354-3_30
Mendeley helps you to discover research relevant for your work.