Exploring Robust Overfitting for Pre-trained Language Models

4Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

We identify the robust overfitting issue for pre-trained language models by showing that the robust test loss increases as the epoch grows. Through comprehensive exploration of the robust loss on the training set, we attribute robust overfitting to the model's memorization of the adversarial training data. We attempt to mitigate robust overfitting by combining regularization methods with adversarial training. Following the philosophy to prevent the model from memorizing the adversarial data, we find that flooding, a regularization method with loss scaling, can mitigate robust overfitting for pre-trained language models. Eventually, we investigate the effect of flooding levels and evaluate the models' adversarial robustness under textual adversarial attacks. Extensive experiments demonstrate that our method can mitigate robust overfitting upon three top adversarial training methods and further promote adversarial robustness.

Cite

CITATION STYLE

APA

Zhu, B., & Rao, Y. (2023). Exploring Robust Overfitting for Pre-trained Language Models. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 5506–5522). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.340

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free