Incorporating word and subword units in unsupervised machine translation using language model rescoring

9Citations
Citations of this article
79Readers
Mendeley users who have this article in their library.

Abstract

This paper describes CAiRE's submission to the unsupervised machine translation track of the WMT'19 news shared task from German to Czech. We leverage a phrase-based statistical machine translation (PBSMT) model and a pre-trained language model to combine word-level neural machine translation (NMT) and subword-level NMT models without using any parallel data. We propose to solve the morphological richness problem of languages by training byte-pair encoding (BPE) embeddings for German and Czech separately, and they are aligned using MUSE (Conneau et al., 2018). To ensure the fluency and consistency of translations, a rescoring mechanism is proposed that reuses the pre-trained language model to select the translation candidates generated through beam search. Moreover, a series of pre-processing and post-processing approaches are applied to improve the quality of final translations.

Cite

CITATION STYLE

APA

Liu, Z., Xu, Y., Winata, G. I., & Fung, P. (2019). Incorporating word and subword units in unsupervised machine translation using language model rescoring. In WMT 2019 - 4th Conference on Machine Translation, Proceedings of the Conference (Vol. 2, pp. 275–282). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-5327

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free