Are we estimating or guesstimating translation quality?

24Citations
Citations of this article
113Readers
Mendeley users who have this article in their library.

Abstract

Recent advances in pre-trained multilingual language models lead to state-of-the-art results on the task of quality estimation (QE) for machine translation. A carefully engineered ensemble of such models won the QE shared task at WMT19. Our in-depth analysis, however, shows that the success of using pre-trained language models for QE is overestimated due to three issues we observed in current QE datasets: (i) The distributions of quality scores are imbalanced and skewed towards good quality scores; (ii) QE models can perform well on these datasets while looking at only source or translated sentences; (iii) They contain statistical artifacts that correlate well with human-annotated QE labels. Our findings suggest that although QE models might capture fluency of translated sentences and complexity of source sentences, they cannot model adequacy of translations effectively.

Cite

CITATION STYLE

APA

Sun, S., Guzmán, F., & Specia, L. (2020). Are we estimating or guesstimating translation quality? In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 6262–6267). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.558

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free