Frequency estimates for statistical word similarity measures

Egidio Terra; C. L.A. Clarke

Conference ProceedingsOPEN ACCESS

Frequency estimates for statistical word similarity measures

Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003 (2003)

DOI: 10.3115/1073445.1073477

141Citations

170Readers

Abstract

Statistical measures of word similarity have application in many areas of natural language processing, such as language modeling and information retrieval. We report a comparative study of two methods for estimating word cooccurrence frequencies required by word similarity measures. Our frequency estimates are generated from a terabyte-sized corpus of Web data, and we study the impact of corpus size on the effectiveness of the measures. We base the evaluation on one TOEFL question set and two practice questions sets, each consisting of a number of multiple choice questions seeking the best synonym for a given target word. For two question sets, a context for the target word is provided, and we examine a number of word similarity measures that exploit this context. Our best combination of similarity measure and frequency estimation method answers 6-8% more questions than the best results previously reported for the same question sets.

Cite

CITATION STYLE

APA

Terra, E., & Clarke, C. L. A. (2003). Frequency estimates for statistical word similarity measures. In Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, HLT-NAACL 2003. Association for Computational Linguistics (ACL). https://doi.org/10.3115/1073445.1073477

Frequency estimates for statistical word similarity measures

Abstract

Cite

Register to see more suggestions