Muppet: Massive Multi-task Representations with Pre-Finetuning

Armen Aghajanyan; Anchit Gupta; Akshat Shrivastava; Xilun Chen; Luke Zettlemoyer; Sonal Gupta

Conference ProceedingsOPEN ACCESS

Muppet: Massive Multi-task Representations with Pre-Finetuning

EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (2021) 5799-5811

DOI: 10.18653/v1/2021.emnlp-main.468

124Citations

214Readers

Abstract

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that pre-finetuning consistently improves performance for pretrained discriminators (e.g. RoBERTa) and generation models (e.g. BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

Cite

CITATION STYLE

APA

Aghajanyan, A., Gupta, A., Shrivastava, A., Chen, X., Zettlemoyer, L., & Gupta, S. (2021). Muppet: Massive Multi-task Representations with Pre-Finetuning. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 5799–5811). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.468

Muppet: Massive Multi-task Representations with Pre-Finetuning

Abstract

Cite

Register to see more suggestions