An important limitation of automatic evaluation metrics is that, when comparing Machine Translation (MT) to a human reference, they are often unable to discriminate between acceptable variation and the differences that are indicative of MT errors. In this paper we present UPF-Cobalt evaluation system that addresses this issue by penalizing the differences in the syntactic contexts of aligned candidate and reference words. We evaluate our metric using the data from WMT workshops of the recent years and show that it performs competitively both at segment and at system levels.
CITATION STYLE
Fomicheva, M., Bel, N., Da Cunha, I., & Malinovskiy, A. (2015). Upf-cobalt submission to wmt15 metrics task. In 10th Workshop on Statistical Machine Translation, WMT 2015 at the 2015 Conference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Proceedings (pp. 373–379). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w15-3046
Mendeley helps you to discover research relevant for your work.