SMART: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization

230Citations
Citations of this article
431Readers
Mendeley users who have this article in their library.

Abstract

Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. However, due to limited data resources from downstream tasks and the extremely high complexity of pre-trained models, aggressive fine-tuning often causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data. To address such an issue in a principled manner, we propose a new learning framework for robust and efficient fine-tuning for pre-trained models to attain better generalization performance. The proposed framework contains two important ingredients: 1. Smoothness-inducing regularization, which effectively manages the complexity of the model; 2. Bregman proximal point optimization, which is an instance of trust-region methods and can prevent aggressive updating. Our experiments show that the proposed framework achieves new state-of-the-art performance on a number of NLP tasks including GLUE, SNLI, SciTail and ANLI. Moreover, it also outperforms the state-of-the-art T5 model, which is the largest pre-trained model containing 11 billion parameters, on GLUE.

References Powered by Scopus

A survey on transfer learning

18478Citations
N/AReaders
Get full text

Multitask Learning

5639Citations
N/AReaders
Get full text

SQuad: 100,000+ questions for machine comprehension of text

4045Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A large language model for electronic health records

373Citations
N/AReaders
Get full text

A Survey on Data Augmentation for Text Classification

210Citations
N/AReaders
Get full text

Self-guided contrastive learning for BERT sentence representations

138Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Jiang, H., He, P., Chen, W., Liu, X., Gao, J., & Zhao, T. (2020). SMART: Robust and efficient fine-tuning for pre-trained natural language models through principled regularized optimization. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 2177–2190). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2020.acl-main.197

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 148

75%

Researcher 37

19%

Lecturer / Post doc 7

4%

Professor / Associate Prof. 5

3%

Readers' Discipline

Tooltip

Computer Science 204

88%

Engineering 16

7%

Linguistics 7

3%

Mathematics 5

2%

Save time finding and organizing research with Mendeley

Sign up for free