Unsupervised neural machine translation (UNMT) has recently achieved significant progress without requirement on any parallel data. The models for UNMT are typically the sequence-to-sequence architecture with an encoder to map sentences in different languages to a shared latent space, and a decoder to generate their corresponding translation. Denoising autoencoding and back-translation are called in every iteration for the models to learn the relationship of sentence pairs in languages or between languages. However, sentences generated by the noise model of autoencoding or the reverse model of back-translation are normally different from those written by humans, which may cause inference bias. In this paper, we propose a regularization method for back-translation to explicitly draw representations of sentence pairs closer in the shared space. To enhance the robustness to sentences after autoencoding or back-translation, the adversarial attack on representations is applied. Experiments on unsupervised English French, English German and English Romanian benchmarks show that our approach outperforms the cross-lingual language model (XLM) baseline by 0.4∼ 1.8 BLEU scores. Additionally, the boost on noisy test sets in most translation directions is over 5 BLEU scores.
CITATION STYLE
Yu, H., Luo, H., Yi, Y., & Cheng, F. (2021). A2R2: Robust Unsupervised Neural Machine Translation with Adversarial Attack and Regularization on Representations. IEEE Access, 9, 19990–19998. https://doi.org/10.1109/ACCESS.2021.3054935
Mendeley helps you to discover research relevant for your work.