Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naïve nearest-neighbour approach to comparing context vectors extracted from large corpora scales poorly (O(n2) in the vocabulary size). In this paper, we compare several existing approaches to approximating the nearest-neighbour search for distributional similarity. We investigate the trade-off between efficiency and accuracy, and find that SASH (Houle and Sakuma, 2005) provides the best balance. © 2006 Association for Computational Linguistics.
CITATION STYLE
Gorman, J., & Curran, J. R. (2006). Scaling distributional similarity to large corpora. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 361–368). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220175.1220221
Mendeley helps you to discover research relevant for your work.