How to Train BERT with an Academic Budget

Peter Izsak; Moshe Berchansky; Omer Levy

Conference ProceedingsOPEN ACCESS

How to Train BERT with an Academic Budget

EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (2021) 10644-10652

DOI: 10.18653/v1/2021.emnlp-main.831

35Citations

187Readers

Abstract

While large language models à la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERTBASE on GLUE tasks at a fraction of the original pretraining cost.

Cite

CITATION STYLE

APA

Izsak, P., Berchansky, M., & Levy, O. (2021). How to Train BERT with an Academic Budget. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 10644–10652). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.831

How to Train BERT with an Academic Budget

Abstract

Cite

Register to see more suggestions