Multi-hypothesis machine translation evaluation

17Citations
Citations of this article
119Readers
Mendeley users who have this article in their library.

Abstract

Reliably evaluating Machine Translation (MT) through automated metrics is a long-standing problem. One of the main challenges is the fact that multiple outputs can be equally valid. Attempts to minimise this issue include metrics that relax the matching of MT output and reference strings, and the use of multiple references. The latter has been shown to significantly improve the performance of evaluation metrics. However, collecting multiple references is expensive and in practice a single reference is generally used. In this paper, we propose an alternative approach: instead of modelling linguistic variation in human reference we exploit the MT model uncertainty to generate multiple diverse translations and use these: (i) as surrogates to reference translations; (ii) to obtain a quantification of translation variability to either complement existing metric scores or (iii) replace references altogether. We show that for a number of popular evaluation metrics our variability estimates lead to substantial improvements in correlation with human judgements of quality by up 15%.

References Powered by Scopus

YiSi - A unified semantic MT quality evaluation and estimation metric for languages with different levels of available resources

114Citations
N/AReaders
Get full text

Accurate evaluation of segment-level machine translation metrics

68Citations
N/AReaders
Get full text

MEANT 2.0: Accurate semantic MT evaluation for any output language

40Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Detecting and Mitigating Hallucinations in Machine Translation: Model Internal Workings Alone Do Well, Sentence Similarity Even Better

26Citations
N/AReaders
Get full text

Residual attention graph convolutional network for web services classification

18Citations
N/AReaders
Get full text

Confidence-based interactable neural-symbolic visual question answering

11Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Fomincheva, M., Specia, L., & Guzmán, F. (2020). Multi-hypothesis machine translation evaluation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 1218–1232). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.113

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 41

75%

Researcher 8

15%

Lecturer / Post doc 4

7%

Professor / Associate Prof. 2

4%

Readers' Discipline

Tooltip

Computer Science 46

74%

Linguistics 8

13%

Engineering 4

6%

Business, Management and Accounting 4

6%

Save time finding and organizing research with Mendeley

Sign up for free