Identification of relevant and redundant automatic metrics for MT evaluation

Michal Munk; Daša Munková; Ľubomír Benko

Conference Proceedings

Identification of relevant and redundant automatic metrics for MT evaluation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 10053 LNAI 141-152

DOI: 10.1007/978-3-319-49397-8_12

7Citations

10Readers

Get full text

Abstract

The paper is aimed at automatic metrics for translation quality assessment (TQA), specifically at machine translation (MT) output and the metrics for the evaluation of MT output (Precision, Recall, F-measure, BLEU, PER, WER and CDER). We examine their reliability and we determine the metrics which show decreasing reliability of the automatic evaluation of MT output. Besides the traditional measures (Cronbach’s alpha and standardized alpha) we use entropy for assessing the reliability of the automatic metrics of MT output. The results were obtained on a dataset covering translation from a low resource language (SK) into English (EN). The main contribution consists of the identification of the redundant automatic MT evaluation metrics.

Author supplied keywords

Cite

CITATION STYLE

APA

Munk, M., Munková, D., & Benko, Ľ. (2016). Identification of relevant and redundant automatic metrics for MT evaluation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10053 LNAI, pp. 141–152). Springer Verlag. https://doi.org/10.1007/978-3-319-49397-8_12

Identification of relevant and redundant automatic metrics for MT evaluation

Abstract

Author supplied keywords

Cite

Register to see more suggestions