Abstract
Most studies in cross-device federated learning focus on small models, due to the server-client communication and on-device computation bottlenecks. In this work, we leverage various techniques for mitigating these bottlenecks to train larger language models in cross-device federated learning. With systematic applications of partial model training, quantization, efficient transfer learning, and communication-efficient optimizers, we are able to train a 21M parameter Transformer that achieves the same perplexity as that of a similarly sized LSTM with ∼ 10× smaller client-to-server communication cost and 11% lower perplexity than smaller LSTMs commonly studied in literature.
Cite
CITATION STYLE
Ro, J. H., Breiner, T., Chen, L. M. C. M., Suresh, A. T., Kumar, S., & Mathews, R. (2022). Scaling Language Model Size in Cross-Device Federated Learning. In FL4NLP 2022 - 1st Workshop on Federated Learning for Natural Language Processing, Proceedings of the Workshop (pp. 6–20). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.fl4nlp-1.2
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.