Intelligent Learning Rate Distribution to Reduce Catastrophic Forgetting in Transformers

Philip Kenneweg; Alexander Schulz; Sarah Schröder; Barbara Hammer

Conference Proceedings

Intelligent Learning Rate Distribution to Reduce Catastrophic Forgetting in Transformers

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13756 LNCS 252-261

DOI: 10.1007/978-3-031-21753-1_25

1Citations

3Readers

Get full text

Abstract

Pretraining language models on large text corpora is a common practice in natural language processing. Fine-tuning of these models is then performed to achieve the best results on a variety of tasks. In this paper, we investigate the problem of catastrophic forgetting in transformer neural networks and question the common practice of fine-tuning with a flat learning rate for the entire network in this context. We perform a hyperparameter optimization process to find learning rate distributions that are better than a flat learning rate. We combine the learning rate distributions thus found and show that they generalize to better performance with respect to the problem of catastrophic forgetting. We validate these learning rate distributions with a variety of NLP benchmarks from the GLUE dataset. The source code is open-source and free software, available at https://github.com/TheMody/NAS-CatastrophicForgetting.

Author supplied keywords

Cite

CITATION STYLE

APA

Kenneweg, P., Schulz, A., Schröder, S., & Hammer, B. (2022). Intelligent Learning Rate Distribution to Reduce Catastrophic Forgetting in Transformers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13756 LNCS, pp. 252–261). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21753-1_25

Intelligent Learning Rate Distribution to Reduce Catastrophic Forgetting in Transformers

Abstract

Author supplied keywords

Cite

Register to see more suggestions