Scaling Language Model Size in Cross-Device Federated Learning

Jae Hun Ro; Theresa Breiner; Lara Mc Connaughey Mingqing Chen; Ananda Theertha Suresh; Shankar Kumar; Rajiv Mathews

Conference Proceedings

Scaling Language Model Size in Cross-Device Federated Learning

FL4NLP 2022 - 1st Workshop on Federated Learning for Natural Language Processing, Proceedings of the Workshop (2022) 6-20

DOI: 10.18653/v1/2022.fl4nlp-1.2

9Citations

41Readers

Get full text

Abstract

Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a 21M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with ∼ 10× smaller client-to-server communication cost and 11% lower perplexity than smaller LSTMs commonly studied in literature.

Cite

CITATION STYLE

APA

Ro, J. H., Breiner, T., Chen, L. M. C. M., Suresh, A. T., Kumar, S., & Mathews, R. (2022). Scaling Language Model Size in Cross-Device Federated Learning. In FL4NLP 2022 - 1st Workshop on Federated Learning for Natural Language Processing, Proceedings of the Workshop (pp. 6–20). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.fl4nlp-1.2

Scaling Language Model Size in Cross-Device Federated Learning

Abstract

Cite

Register to see more suggestions