Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine transla-tion. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acy-clic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT re-markably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autore-gressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alle-viates the error accumulation problem in au-toregressive generation, which provides new insights into the advantages of NAR generation.
CITATION STYLE
Huang, F., Ke, P., & Huang, M. (2023). Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation. Transactions of the Association for Computational Linguistics, 11, 941–959. https://doi.org/10.1162/tacl_a_00582
Mendeley helps you to discover research relevant for your work.