Quality Scoring of Source Words in Machine Translations

Priyesh Jain; Sunita Sarawagi; Tushar Tomar

Conference ProceedingsOPEN ACCESS

Quality Scoring of Source Words in Machine Translations

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (2022) 10683-10691

1Citations

17Readers

Abstract

Word-level quality scores on input source sentences can provide useful feedback to an end-user when translating into an unfamiliar target language. Recent approaches either require training custom models on synthetic data or repeatedly invoking the translation model. We propose a simple approach based on comparing probabilities from two language models. The basic premise of our method is to reason how well each source word is explained by the generated translation as against the preceding source language words. Our approach provides between 2.2 and 27.1 higher F1 score and is significantly faster than state of the art methods on three language pairs. Also, our method does not require training any new model. We release a public dataset on word omissions and mistranslations on a new language pair.

Cite

CITATION STYLE

APA

Jain, P., Sarawagi, S., & Tomar, T. (2022). Quality Scoring of Source Words in Machine Translations. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 10683–10691). Association for Computational Linguistics (ACL).

Quality Scoring of Source Words in Machine Translations

Abstract

Cite

Register to see more suggestions