Pitfalls in the evaluation of sentence embeddings

12Citations
Citations of this article
114Readers
Mendeley users who have this article in their library.

Abstract

Deep learning models continuously break new records across different NLP tasks. At the same time, their success exposes weaknesses of model evaluation. Here, we compile several key pitfalls of evaluation of sentence embeddings, a currently very popular NLP paradigm. These pitfalls include the comparison of embeddings of different sizes, normalization of embeddings, and the low (and diverging) correlations between transfer and probing tasks. Our motivation is to challenge the current evaluation of sentence embeddings and to provide an easy-to-access reference for future research. Based on our insights, we also recommend better practices for better future evaluations of sentence embeddings.

Cite

CITATION STYLE

APA

Eger, S., Rücklé, A., & Gurevych, I. (2019). Pitfalls in the evaluation of sentence embeddings. In ACL 2019 - 4th Workshop on Representation Learning for NLP, RepL4NLP 2019 - Proceedings of the Workshop (pp. 55–60). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-4308

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free