IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization

17Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

Abstract

Fine-tuning pre-trained language models (PTLMs), such as BERT and its better variant RoBERTa, has been a common practice for advancing performance in natural language understanding (NLU) tasks. Recent advance in representation learning shows that isotropic (i.e., unit-variance and uncorrelated) embeddings can significantly improve performance on downstream tasks with faster convergence and better generalization. The isotropy of the pre-trained embeddings in PTLMs, however, is relatively under-explored. In this paper, we analyze the isotropy of the pre-trained [CLS] embeddings of PTLMs with straightforward visualization, and point out two major issues: high variance in their standard deviation, and high correlation between different dimensions. We also propose a new network regularization method, isotropic batch normalization (IsoBN) to address the issues, towards learning more isotropic representations in fine-tuning by dynamically penalizing dominating principal components. This simple yet effective fine-tuning method yields about 1.0 absolute increment on the average of seven NLU tasks.

Cite

CITATION STYLE

APA

Zhou, W., Lin, B. Y., & Ren, X. (2021). IsoBN: Fine-Tuning BERT with Isotropic Batch Normalization. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 16, pp. 14621–14629). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i16.17718

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free