Abstract
Pretrained language models have served as the backbone for many state-of-the-art NLP results. These models are large and expensive to train. Recent work suggests that continued pretraining on task-specific data is worth the effort as pretraining leads to improved performance on downstream tasks. We explore alternatives to full-scale task-specific pretraining of language models through the use of adapter modules, a parameter-efficient approach to transfer learning. We find that adapter-based pretraining is able to achieve comparable results to task-specific pretraining while using a fraction of the overall trainable parameters. We further explore direct use of adapters without pretraining and find that the direct fine-tuning performs mostly on par with pretrained adapter models, contradicting previously proposed benefits of continual pretraining in full pretraining fine-tuning strategies. Lastly, we perform an ablation study on task-adaptive pretraining to investigate how different hyperparameter settings can change the effectiveness of the pretraining.
Cite
CITATION STYLE
Kim, S., Shum, A., Susanj, N., & Hilgart, J. (2021). Revisiting Pretraining with Adapters. In RepL4NLP 2021 - 6th Workshop on Representation Learning for NLP, Proceedings of the Workshop (pp. 90–99). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.repl4nlp-1.11
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.