How to Train BERT with an Academic Budget

35Citations
Citations of this article
187Readers
Mendeley users who have this article in their library.

Abstract

While large language models à la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERTBASE on GLUE tasks at a fraction of the original pretraining cost.

Cite

CITATION STYLE

APA

Izsak, P., Berchansky, M., & Levy, O. (2021). How to Train BERT with an Academic Budget. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 10644–10652). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.831

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free