BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

270Citations
Citations of this article
276Readers
Mendeley users who have this article in their library.

Abstract

We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, the method is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning: they support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge.

Cite

CITATION STYLE

APA

Ben-Zaken, E., Ravfogel, S., & Goldberg, Y. (2022). BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2, pp. 1–9). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.acl-short.1

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free