Elastic parameter server load distribution in deep learning clusters

Yangrui Chen; Yanghua Peng; Yixin Bao; Chuan Wu; Yibo Zhu; Chuanxiong Guo

Conference ProceedingsOPEN ACCESS

Elastic parameter server load distribution in deep learning clusters

SoCC 2020 - Proceedings of the 2020 ACM Symposium on Cloud Computing (2020) 507-521

DOI: 10.1145/3419111.3421307

33Citations

42Readers

Get full text

Abstract

In distributed DNN training, parameter servers (PS) can become performance bottlenecks due to PS stragglers, caused by imbalanced parameter distribution, bandwidth contention, or computation interference. Few existing studies have investigated efficient parameter (aka load) distribution among PSs. We observe significant training inefficiency with the current parameter assignment in representative machine learning frameworks (e.g., MXNet, TensorFlow), and big potential for training acceleration with better PS load distribution. We design PSLD, a dynamic parameter server load distribution scheme, to mitigate PS straggler issues and accelerate distributed model training in the PS architecture. An exploitation-exploration method is carefully designed to scale in and out parameter servers and adjust parameter distribution among PSs on the go. We also design an elastic PS scaling module to carry out our scheme with little interruption to the training process. We implement our module on top of open-source PS architectures, including MXNet and BytePS. Testbed experiments show up to 2.86x speed-up in model training with PSLD, for different ML models under various straggler settings.

Cite

CITATION STYLE

APA

Chen, Y., Peng, Y., Bao, Y., Wu, C., Zhu, Y., & Guo, C. (2020). Elastic parameter server load distribution in deep learning clusters. In SoCC 2020 - Proceedings of the 2020 ACM Symposium on Cloud Computing (pp. 507–521). Association for Computing Machinery, Inc. https://doi.org/10.1145/3419111.3421307

Elastic parameter server load distribution in deep learning clusters

Abstract

Cite

Register to see more suggestions