This paper describes CAiRE's submission to the unsupervised machine translation track of the WMT'19 news shared task from German to Czech. We leverage a phrase-based statistical machine translation (PBSMT) model and a pre-trained language model to combine word-level neural machine translation (NMT) and subword-level NMT models without using any parallel data. We propose to solve the morphological richness problem of languages by training byte-pair encoding (BPE) embeddings for German and Czech separately, and they are aligned using MUSE (Conneau et al., 2018). To ensure the fluency and consistency of translations, a rescoring mechanism is proposed that reuses the pre-trained language model to select the translation candidates generated through beam search. Moreover, a series of pre-processing and post-processing approaches are applied to improve the quality of final translations.
CITATION STYLE
Liu, Z., Xu, Y., Winata, G. I., & Fung, P. (2019). Incorporating word and subword units in unsupervised machine translation using language model rescoring. In WMT 2019 - 4th Conference on Machine Translation, Proceedings of the Conference (Vol. 2, pp. 275–282). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-5327
Mendeley helps you to discover research relevant for your work.