The University of Edinburgh participated in the WMT19 Shared Task on News Translation in six language directions: English↔Gujarati, English↔Chinese, German→English, and English→Czech. For all translation directions, we created or used back-translations of monolingual data in the target language as additional synthetic training data. For English↔Gujarati, we also explored semi-supervised MT with cross-lingual language model pre-training, and translation pivoting through Hindi. For translation to and from Chinese, we investigated character-based tokenisation vs. sub-word segmentation of Chinese text. For German→English, we studied the impact of vast amounts of back-translated training data on translation quality, gaining a few additional insights over Edunov et al. (2018). For English→Czech, we compared different pre-processing and tokenisation regimes.
CITATION STYLE
Bawden, R., Bogoychev, N., Germann, U., Grundkiewicz, R., Kirefu, F., Barone, A. V. M., & Birch, A. (2019). The University of Edinburgh’s submissions to the WMT19 news translation task. In WMT 2019 - 4th Conference on Machine Translation, Proceedings of the Conference (Vol. 2, pp. 103–115). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-5304
Mendeley helps you to discover research relevant for your work.