Hybrid-Regressive Paradigm for Accurate and Speed-Robust Neural Machine Translation

1Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

This study provides empirical evidence that non-autoregressive translation (NAT) is less robust in decoding batch size and hardware settings than autoregressive translation (AT). To address this issue, we demonstrate that incorporating a small number of AT predictions can significantly reduce the performance gap between AT and NAT through synthetic experiments. In line with this, we propose hybrid-regressive translation (HRT), a two-stage translation prototype that combines the strengths of AT and NAT. Specifically, HRT initially generates discontinuous sequences using autoregression (e.g., making predictions for every k tokens, k > 1), and then fills in all previously skipped tokens simultaneously in a non-autoregressive manner. Experimental results on five translation tasks show that HRT achieves comparable translation quality to AT while providing at least 1.5x faster inference, irrespective of batch size and device. Moreover, HRT successfully retains the desirable characteristics of AT in the deep-encoder-shallow-decoder architecture, enabling further speed improvements without sacrificing BLEU scores.

Cite

CITATION STYLE

APA

Wang, Q., Hu, X., & Chen, M. (2023). Hybrid-Regressive Paradigm for Accurate and Speed-Robust Neural Machine Translation. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 5931–5945). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.367

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free