A framework for the construction of monolingual and cross-lingualword similarity datasets

José Camacho-Collados; Mohammad Taher Pilehvar; Roberto Navigli

Conference ProceedingsOPEN ACCESS

A framework for the construction of monolingual and cross-lingualword similarity datasets

ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (2015) 2 1-7

DOI: 10.3115/v1/p15-2001

48Citations

122Readers

Abstract

Despite being one of the most popular tasks in lexical semantics, word similarity has often been limited to the English language. Other languages, even those that are widely spoken such as Spanish, do not have a reliable word similarity evaluation framework. We put forward robust methodologies for the extension of existing English datasets to other languages, both at monolingual and cross-lingual levels. We propose an automatic standardization for the construction of cross-lingual similarity datasets, and provide an evaluation, demonstrating its reliability and robustness. Based on our procedure and taking the RG-65 word similarity dataset as a reference, we release two high-quality Spanish and Farsi (Persian) monolingual datasets, and fifteen cross-lingual datasets for six languages: English, Spanish, French, German, Portuguese, and Farsi.

Cite

CITATION STYLE

APA

Camacho-Collados, J., Pilehvar, M. T., & Navigli, R. (2015). A framework for the construction of monolingual and cross-lingualword similarity datasets. In ACL-IJCNLP 2015 - 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, Proceedings of the Conference (Vol. 2, pp. 1–7). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/p15-2001

A framework for the construction of monolingual and cross-lingualword similarity datasets

Abstract

Cite

Register to see more suggestions