Linguistic features for automatic evaluation of heterogenous MT systems

73Citations
Citations of this article
113Readers
Mendeley users who have this article in their library.

Abstract

Evaluation results recently reported by Callison-Burch et al. (2006) and Koehn and Monz (2006), revealed that, in certain cases, the BLEU metric may not be a reliable MT quality indicator. This happens, for instance, when the systems under evaluation are based on different paradigms, and therefore, do not share the same lexicon. The reason is that, while MT quality aspects are diverse, BLEU limits its scope to the lexical dimension. In this work, we suggest using metrics which take into account linguistic features at more abstract levels. We provide experimental results showing that metrics based on deeper linguistic information (syntactic/shallow-semantic) are able to produce more reliable system rankings than metrics based on lexical matching alone, specially when the systems under evaluation are of a different nature.

Cite

CITATION STYLE

APA

Giménez, J., & Màrquez, L. (2007). Linguistic features for automatic evaluation of heterogenous MT systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 256–264). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1626355.1626393

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free