Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation

4Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

Non-AutoRegressive (NAR) text generation models have drawn much attention because of their significantly faster decoding speed and good generation quality in machine transla-tion. However, in a wider range of text generation tasks, existing NAR models lack proper pre-training, making them still far behind the pre-trained autoregressive models. In this paper, we propose Pre-trained Directed Acy-clic Transformer (PreDAT) and a novel pre-training task to promote prediction consistency in NAR generation. Experiments on five text generation tasks show that our PreDAT re-markably outperforms existing pre-trained NAR models (+4.2 score on average) and even achieves better results than pre-trained autore-gressive baselines in n-gram-based metrics, along with 17 times speedup in throughput. Further analysis shows that PreDAT benefits from the unbiased prediction order that alle-viates the error accumulation problem in au-toregressive generation, which provides new insights into the advantages of NAR generation.

Cite

CITATION STYLE

APA

Huang, F., Ke, P., & Huang, M. (2023). Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation. Transactions of the Association for Computational Linguistics, 11, 941–959. https://doi.org/10.1162/tacl_a_00582

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free