Robust machine translation evaluation with entailment features

59Citations
Citations of this article
143Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres. We believe that the main reason is their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that evaluates MT output based on a rich set of features motivated by textual entailment, such as lexical-semantic (in-)compatibility and argument structure overlap. We compare this metric against a combination metric of four state-of-the-art scores (BLEU, NIST, TER, and METEOR) in two different settings. The combination metric outperforms the individual scores, but is bested by the entailment-based metric. Combining the entailment and traditional features yields further improvements. © 2009 ACL and AFNLP.

Cite

CITATION STYLE

APA

Padó, S., Galley, M., Jurafsky, D., & Manning, C. (2009). Robust machine translation evaluation with entailment features. In ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. (pp. 297–305). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1687878.1687922

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free