Efficient elicitation of annotations for human evaluation of machine translation

Keisuke Sakaguchi; Matt Post; Benjamin Van Durme

Conference Proceedings

Efficient elicitation of annotations for human evaluation of machine translation

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2014) 1-11

DOI: 10.3115/v1/w14-3301

53Citations

110Readers

Get full text

Abstract

A main output of the annual Workshop on Statistical Machine Translation (WMT) is a ranking of the systems that participated in its shared translation tasks, produced by aggregating pairwise sentencelevel comparisons collected from human judges. Over the past few years, there have been a number of tweaks to the aggregation formula in attempts to address issues arising from the inherent ambiguity and subjectivity of the task, as well as weaknesses in the proposed models and the manner of model selection. We continue this line of work by adapting the TrueSkillTM algorithm an online approach for modeling the relative skills of players in ongoing competitions, such as Microsoft s Xbox Live to the human evaluation of machine translation output. Our experimental results show that TrueSkill outperforms other recently proposed models on accuracy, and also can significantly reduce the number of pairwise annotations that need to be collected by sampling non-uniformly from the space of system competitions.

Cite

CITATION STYLE

APA

Sakaguchi, K., Post, M., & Van Durme, B. (2014). Efficient elicitation of annotations for human evaluation of machine translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1–11). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-3301

Efficient elicitation of annotations for human evaluation of machine translation

Abstract

Cite

Register to see more suggestions