Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

Emmy Liu; Aditi Chaudhary; Graham Neubig

Conference ProceedingsOPEN ACCESS

Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (2023) 15095-15111

DOI: 10.18653/v1/2023.emnlp-main.933

3Citations

15Readers

Abstract

Idioms are common in everyday language, but often pose a challenge to translators because their meanings do not follow from the meanings of their parts. Despite significant advances, machine translation systems still struggle to translate idiomatic expressions. We provide a simple characterization of idiomatic translation and related issues. This allows us to conduct a synthetic experiment revealing a tipping point at which transformer-based machine translation models correctly default to idiomatic translations. To expand multilingual resources, we compile a dataset of ∼ 4k natural sentences containing idiomatic expressions in French, Finnish, and Japanese. To improve translation of natural idioms, we introduce two straightforward yet effective techniques: the strategic upweighting of training loss on potentially idiomatic sentences, and using retrieval-augmented models. This not only improves the accuracy of a strong pretrained MT model on idiomatic sentences by up to 13% in absolute accuracy, but also holds potential benefits for non-idiomatic sentences.

Cite

CITATION STYLE

APA

Liu, E., Chaudhary, A., & Neubig, G. (2023). Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 15095–15111). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.933

Crossing the Threshold: Idiomatic Machine Translation through Retrieval Augmentation and Loss Weighting

Abstract

Cite

Register to see more suggestions