Efficient elicitation of annotations for human evaluation of machine translation

53Citations
Citations of this article
110Readers
Mendeley users who have this article in their library.
Get full text

Abstract

A main output of the annual Workshop on Statistical Machine Translation (WMT) is a ranking of the systems that participated in its shared translation tasks, produced by aggregating pairwise sentencelevel comparisons collected from human judges. Over the past few years, there have been a number of tweaks to the aggregation formula in attempts to address issues arising from the inherent ambiguity and subjectivity of the task, as well as weaknesses in the proposed models and the manner of model selection. We continue this line of work by adapting the TrueSkillTM algorithm an online approach for modeling the relative skills of players in ongoing competitions, such as Microsoft s Xbox Live to the human evaluation of machine translation output. Our experimental results show that TrueSkill outperforms other recently proposed models on accuracy, and also can significantly reduce the number of pairwise annotations that need to be collected by sampling non-uniformly from the space of system competitions.

Cite

CITATION STYLE

APA

Sakaguchi, K., Post, M., & Van Durme, B. (2014). Efficient elicitation of annotations for human evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1–11). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3301

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free