Intrinsic evaluation ofword vectors fails to predict extrinsic performance

64Citations
Citations of this article
132Readers
Mendeley users who have this article in their library.

Abstract

The quality of word representations is frequently assessed using correlation with human judgements of word similarity. Here, we question whether such intrinsic evaluation can predict the merits of the representations for downstream tasks. We study the correlation between results on ten word similarity benchmarks and tagger performance on three standard sequence labeling tasks using a variety of word vectors induced from an unannotated corpus of 3.8 billion words, and demonstrate that most intrinsic evaluations are poor predictors of downstream performance. We argue that this issue can be traced in part to a failure to distinguish specific similarity from relatedness in intrinsic evaluation datasets. We make our evaluation tools openly available to facilitate further study.

Cite

CITATION STYLE

APA

Chiu, B., Korhonen, A., & Pyysalo, S. (2016). Intrinsic evaluation ofword vectors fails to predict extrinsic performance. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1–6). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w16-2501

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free