To tune or not to tune? adapting pretrained representations to diverse tasks

200Citations
Citations of this article
615Readers
Mendeley users who have this article in their library.

Abstract

While most previous work has focused on different pretraining objectives and architectures for transfer learning, we ask how to best adapt the pretrained model to a given target task. We focus on the two most common forms of adaptation, feature extraction (where the pretrained weights are frozen), and directly finetuning the pretrained model. Our empirical results across diverse NLP tasks with two stateof- the-art models show that the relative performance of fine-tuning vs. feature extraction depends on the similarity of the pretraining and target tasks. We explore possible explanations for this finding and provide a set of adaptation guidelines for the NLP practitioner.

Cite

CITATION STYLE

APA

Peters, M. E., Ruder, S., & Smith, N. A. (2019). To tune or not to tune? adapting pretrained representations to diverse tasks. In ACL 2019 - 4th Workshop on Representation Learning for NLP, RepL4NLP 2019 - Proceedings of the Workshop (pp. 7–14). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/w19-4302

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free