Linguistic features for automatic evaluation of heterogenous MT systems

Jesús Giménez; Lluís Màrquez

Conference Proceedings

Linguistic features for automatic evaluation of heterogenous MT systems

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2007) 256-264

DOI: 10.3115/1626355.1626393

74Citations

115Readers

Get full text

Abstract

Evaluation results recently reported by Callison-Burch et al. (2006) and Koehn and Monz (2006), revealed that, in certain cases, the BLEU metric may not be a reliable MT quality indicator. This happens, for instance, when the systems under evaluation are based on different paradigms, and therefore, do not share the same lexicon. The reason is that, while MT quality aspects are diverse, BLEU limits its scope to the lexical dimension. In this work, we suggest using metrics which take into account linguistic features at more abstract levels. We provide experimental results showing that metrics based on deeper linguistic information (syntactic/shallow-semantic) are able to produce more reliable system rankings than metrics based on lexical matching alone, specially when the systems under evaluation are of a different nature.

Cite

CITATION STYLE

APA

Giménez, J., & Màrquez, L. (2007). Linguistic features for automatic evaluation of heterogenous MT systems. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 256–264). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1626355.1626393

Linguistic features for automatic evaluation of heterogenous MT systems

Abstract

Cite

Register to see more suggestions