This paper presents the results of the WMT11 shared tasks, which included a translation task, a system combination task, and a task for machine translation evaluation metrics. We conducted a large-scale manual evaluation of 148 machine translation systems and 41 system combination entries. We used the ranking of these systems to measure how strongly automatic metrics correlate with human judgments of translation quality for 21 evaluation metrics. This year featured a Haitian Creole to English task translating SMS messages sent to an emergency response service in the aftermath of the Haitian earthquake. We also conducted a pilot 'tunable metrics' task to test whether optimizing a fixed system to different metrics would result in perceptibly different translation quality.
CITATION STYLE
Callison-Burch, C., Koehn, P., Monz, C., & Zaidan, O. F. (2011). Findings of the 2011 Workshop on Statistical Machine Translation. In WMT 2011 - 6thWorkshop on Statistical Machine Translation, Proceedings of the Workshop (pp. 22–64). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1626431.1626433
Mendeley helps you to discover research relevant for your work.