Scaling distributional similarity to large corpora

39Citations
Citations of this article
140Readers
Mendeley users who have this article in their library.

Abstract

Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the naïve nearest-neighbour approach to comparing context vectors extracted from large corpora scales poorly (O(n2) in the vocabulary size). In this paper, we compare several existing approaches to approximating the nearest-neighbour search for distributional similarity. We investigate the trade-off between efficiency and accuracy, and find that SASH (Houle and Sakuma, 2005) provides the best balance. © 2006 Association for Computational Linguistics.

Cite

CITATION STYLE

APA

Gorman, J., & Curran, J. R. (2006). Scaling distributional similarity to large corpora. In COLING/ACL 2006 - 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference (Vol. 1, pp. 361–368). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1220175.1220221

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free