Transfer learning for sequence generation: From single-source to multi-source

Xuancheng Huang; Jingfang Xu; Maosong Sun; Yang Liu

Conference ProceedingsOPEN ACCESS

Transfer learning for sequence generation: From single-source to multi-source

ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2021) 5738-5750

DOI: 10.18653/v1/2021.acl-long.446

5Citations

70Readers

Abstract

Multi-source sequence generation (MSG) is an important kind of sequence generation tasks that takes multiple sources, including automatic post-editing, multi-source translation, multi-document summarization, etc. As MSG tasks suffer from the data scarcity problem and recent pretrained models have been proven to be effective for low-resource downstream tasks, transferring pretrained sequence-to-sequence models to MSG tasks is essential. Although directly finetuning pretrained models on MSG tasks and concatenating multiple sources into a single long sequence is regarded as a simple method to transfer pretrained models to MSG tasks, we conjecture that the direct finetuning method leads to catastrophic forgetting and solely relying on pretrained self-attention layers to capture cross-source information is not sufficient. Therefore, we propose a two-stage finetuning method to alleviate the pretrain-finetune discrepancy and introduce a novel MSG model with a fine encoder to learn better representations in MSG tasks. Experiments show that our approach achieves new state-of-the-art results on the WMT17 APE task and multi-source translation task using the WMT14 test set. When adapted to document-level translation, our framework outperforms strong baselines significantly.

Cite

CITATION STYLE

APA

Huang, X., Xu, J., Sun, M., & Liu, Y. (2021). Transfer learning for sequence generation: From single-source to multi-source. In ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 5738–5750). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.acl-long.446

Transfer learning for sequence generation: From single-source to multi-source

Abstract

Cite

Register to see more suggestions