Scaling Language Model Size in Cross-Device Federated Learning

9Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a 21M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with ∼ 10× smaller client-to-server communication cost and 11% lower perplexity than smaller LSTMs commonly studied in literature.

Cite

CITATION STYLE

APA

Ro, J. H., Breiner, T., Chen, L. M. C. M., Suresh, A. T., Kumar, S., & Mathews, R. (2022). Scaling Language Model Size in Cross-Device Federated Learning. In FL4NLP 2022 - 1st Workshop on Federated Learning for Natural Language Processing, Proceedings of the Workshop (pp. 6–20). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.fl4nlp-1.2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free