Improving reliability of word similarity evaluation by redesigning annotation task and performance measure

Oded Avraham; Yoav Goldberg

Conference ProceedingsOPEN ACCESS

Improving reliability of word similarity evaluation by redesigning annotation task and performance measure

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2016) 106-110

DOI: 10.18653/v1/w16-2519

9Citations

87Readers

Abstract

We suggest a new method for creating and using gold-standard datasets for word similarity evaluation. Our goal is to improve the reliability of the evaluation, and we do this by redesigning the annotation task to achieve higher inter-rater agreement, and by defining a performance measure which takes the reliability of each annotation decision in the dataset into account.

Cite

CITATION STYLE

APA

Avraham, O., & Goldberg, Y. (2016). Improving reliability of word similarity evaluation by redesigning annotation task and performance measure. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 106–110). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2519

Improving reliability of word similarity evaluation by redesigning annotation task and performance measure

Abstract

Cite

Register to see more suggestions