While large language models à la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERTBASE on GLUE tasks at a fraction of the original pretraining cost.
CITATION STYLE
Izsak, P., Berchansky, M., & Levy, O. (2021). How to Train BERT with an Academic Budget. In EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 10644–10652). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.emnlp-main.831
Mendeley helps you to discover research relevant for your work.