Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT

7Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Although neural machine translation (NMT) yields promising translation performance, it unfortunately suffers from over- and under-translation issues [31], of which studies have become research hotspots in NMT. At present, these studies mainly apply the dominant automatic evaluation metrics, such as BLEU, to evaluate the overall translation quality with respect to both adequacy and fluency. However, they are unable to accurately measure the ability of NMT systems in dealing with the above-mentioned issues. In this paper, we propose two quantitative metrics, the Otem and Utem, to automatically evaluate the system performance in terms of over- and under-translation respectively. Both metrics are based on the proportion of mismatched n-grams between gold reference and system translation. We evaluate both metrics by comparing their scores with human evaluations, where the values of Pearson Correlation Coefficient reveal their strong correlation. Moreover, in-depth analyses on various translation systems indicate some inconsistency between BLEU and our proposed metrics, highlighting the necessity and significance of our metrics.

Cite

CITATION STYLE

APA

Yang, J., Zhang, B., Qin, Y., Zhang, X., Lin, Q., & Su, J. (2018). Otem&Utem: Over- and Under-Translation Evaluation Metric for NMT. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11108 LNAI, pp. 291–302). Springer Verlag. https://doi.org/10.1007/978-3-319-99495-6_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free