From research to production and back: Ludicrously fast neural machine translation

Young Jin Kim; Marcin Junczys-Dowmunt; Hany Hassan; Alham Fikri Aji; Kenneth Heafield; Roman Grundkiewicz; Nikolay Bogoychev

Conference ProceedingsOPEN ACCESS

From research to production and back: Ludicrously fast neural machine translation

EMNLP-IJCNLP 2019 - Proceedings of the 3rd Workshop on Neural Generation and Translation (2019) 280-288

DOI: 10.18653/v1/d19-5632

61Citations

105Readers

Abstract

This paper describes the submissions of the “Marian” team to the WNGT 2019 efficiency shared task. Taking our dominating submissions to the previous edition of the shared task as a starting point, we develop improved teacher-student training via multi-agent dual-learning and noisy backward-forward translation for Transformer-based student models. For efficient CPU-based decoding, we propose pre-packed 8-bit matrix products, improved batched decoding, cache-friendly student architectures with parameter sharing and light-weight RNN-based decoder architectures. GPU-based decoding benefits from the same architecture changes, from pervasive 16-bit inference and concurrent streams. These modifications together with profiler-based C++ code optimization allow us to push the Pareto frontier established during the 2018 edition towards 24x (CPU) and 14x (GPU) faster models at comparable or higher BLEU values. Our fastest CPU model is more than 4x faster than last year’s fastest submission at more than 3 points higher BLEU. Our fastest GPU model at 1.5 seconds translation time is slightly faster than last year’s fastest RNN-based submissions, but outperforms them by more than 4 BLEU and 10 BLEU points respectively.

Cite

CITATION STYLE

APA

Kim, Y. J., Junczys-Dowmunt, M., Hassan, H., Aji, A. F., Heafield, K., Grundkiewicz, R., & Bogoychev, N. (2019). From research to production and back: Ludicrously fast neural machine translation. In EMNLP-IJCNLP 2019 - Proceedings of the 3rd Workshop on Neural Generation and Translation (pp. 280–288). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/d19-5632

From research to production and back: Ludicrously fast neural machine translation

Abstract

Cite

Register to see more suggestions