Robust machine translation evaluation with entailment features

Sebastian Padó; Michel Galley; Dan Jurafsky; Chris Manning

Conference Proceedings

Robust machine translation evaluation with entailment features

ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. (2009) 297-305

DOI: 10.3115/1687878.1687922

59Citations

143Readers

Get full text

Abstract

Existing evaluation metrics for machine translation lack crucial robustness: their correlations with human quality judgments vary considerably across languages and genres. We believe that the main reason is their inability to properly capture meaning: A good translation candidate means the same thing as the reference translation, regardless of formulation. We propose a metric that evaluates MT output based on a rich set of features motivated by textual entailment, such as lexical-semantic (in-)compatibility and argument structure overlap. We compare this metric against a combination metric of four state-of-the-art scores (BLEU, NIST, TER, and METEOR) in two different settings. The combination metric outperforms the individual scores, but is bested by the entailment-based metric. Combining the entailment and traditional features yields further improvements. © 2009 ACL and AFNLP.

Cite

CITATION STYLE

APA

Padó, S., Galley, M., Jurafsky, D., & Manning, C. (2009). Robust machine translation evaluation with entailment features. In ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf. (pp. 297–305). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1687878.1687922

Robust machine translation evaluation with entailment features

Abstract

Cite

Register to see more suggestions