Denoising based sequence-to-sequence pre-training for text generation

Liang Wang; Wei Zhao; Ruoyu Jia; Sujian Li; Jingming Liu

Conference ProceedingsOPEN ACCESS

Denoising based sequence-to-sequence pre-training for text generation

EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2019) 4003-4015

DOI: 10.18653/v1/D19-1412

26Citations

142Readers

Abstract

This paper presents a new sequence-to-sequence (seq2seq) pre-training method PoDA (Pre-training of Denoising Autoencoders), which learns representations suitable for text generation tasks. Unlike encoder-only (e.g., BERT) or decoder-only (e.g., OpenAI GPT) pre-training approaches, PoDA jointly pre-trains both the encoder and decoder by denoising the noise-corrupted text, and it also has the advantage of keeping the network architecture unchanged in the subsequent fine-tuning stage. Meanwhile, we design a hybrid model of Transformer and pointer-generator networks as the backbone architecture for PoDA. We conduct experiments on two text generation tasks: abstractive summarization, and grammatical error correction. Results on four datasets show that PoDA can improve model performance over strong baselines without using any task-specific techniques and significantly speed up convergence.

Cite

CITATION STYLE

APA

Wang, L., Zhao, W., Jia, R., Li, S., & Liu, J. (2019). Denoising based sequence-to-sequence pre-training for text generation. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 4003–4015). Association for Computational Linguistics. https://doi.org/10.18653/v1/D19-1412

Denoising based sequence-to-sequence pre-training for text generation

Abstract

Cite

Register to see more suggestions