Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation

5Citations
Citations of this article
54Readers
Mendeley users who have this article in their library.

Abstract

Large pretrained language models (PLMs) are often domain-or task-adapted via fine-tuning or prompting. Finetuning requires modifying all of the parameters and having enough data to avoid overfitting while prompting requires no training and few examples but limits performance. Instead, we pre-pare PLMs for data-and parameter-efficient adaptation by learning to learn the difference between general and adapted PLMs. This difference is expressed in terms of model weights and sublayer structure through our proposed dynamic low-rank reparame-terization and learned architecture controller. Experiments on few-shot dialogue completion, low-resource abstractive summarization, and multi-domain language modeling show improvements in adaptation time and performance over direct finetuning or preparation via domain-adaptive pretraining. Ablations show our task-adaptive reparameterization (TARP) and model search (TAMS) components indi-vidually improve on other parameter-efficient transfer like adapters and structure-learning methods like learned sparsification.

Cite

CITATION STYLE

APA

Hou, Z., Salazar, J., & Polovets, G. (2022). Meta-Learning the Difference: Preparing Large Language Models for Efficient Adaptation. Transactions of the Association for Computational Linguistics, 10, 1249–1265. https://doi.org/10.1162/tacl_a_00517

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free