Ranking vs. regression in machine translation evaluation

Kevin Duh

Conference ProceedingsOPEN ACCESS

Ranking vs. regression in machine translation evaluation

Duh K

3rd Workshop on Statistical Machine Translation, WMT 2008 at the Annual Meeting of the Association for Computational Linguistics, ACL 2008 (2008) 191-194

DOI: 10.3115/1626394.1626425

39Citations

110Readers

Abstract

Automatic evaluation of machine translation (MT) systems is an important research topic for the advancement of MT technology. Most automatic evaluation methods proposed to date are score-based: they compute scores that represent translation quality, and MT systems are compared on the basis of these scores. We advocate an alternative perspective of automatic MT evaluation based on ranking. Instead of producing scores, we directly produce a ranking over the set of MT systems to be compared. This perspective is often simpler when the evaluation goal is system comparison. We argue that it is easier to elicit human judgments of ranking and develop a machine learning approach to train on rank data. We compare this ranking method to a score-based regression method on WMT07 data. Results indicate that ranking achieves higher correlation to human judgments, especially in cases where ranking-specific features are used.

Cite

CITATION STYLE

APA

Duh, K. (2008). Ranking vs. regression in machine translation evaluation. In 3rd Workshop on Statistical Machine Translation, WMT 2008 at the Annual Meeting of the Association for Computational Linguistics, ACL 2008 (pp. 191–194). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1626394.1626425

Ranking vs. regression in machine translation evaluation

Abstract

Cite

Register to see more suggestions