Modeling and Optimizing the Scaling Performance in Distributed Deep Learning Training

Ting Liu; Tianhao Miao; Qinghua Wu; Zhenyu Li; Guangxin He; Jiaoren Wu; Shengzhuo Zhang; Xingwu Yang; Gareth Tyson; Gaogang Xie

Conference ProceedingsOPEN ACCESS

Modeling and Optimizing the Scaling Performance in Distributed Deep Learning Training

WWW 2022 - Proceedings of the ACM Web Conference 2022 (2022) 1764-1773

DOI: 10.1145/3485447.3511981

5Citations

13Readers

Abstract

Distributed Deep Learning (DDL) is widely used to accelerate deep neural network training for various Web applications. In each iteration of DDL training, each worker synchronizes neural network gradients with other workers. This introduces communication overhead and degrades the scaling performance. In this paper, we propose a recursive model, OSF (Scaling Factor considering Overlap), for estimating the scaling performance of DDL training of neural network models, given the settings of the DDL system. OSF captures two main characteristics of DDL training: the overlap between computation and communication, and the tensor fusion for batching updates. Measurements on a real-world DDL system show that OSF obtains a low estimation error (ranging from 0.5% to 8.4% for different models). Using OSF, we identify the factors that degrade the scaling performance, and propose solutions to effectively mitigate their impacts. Specifically, the proposed adaptive tensor fusion improves the scaling performance by 32.2%150% compared to the constant tensor fusion buffer size.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, T., Miao, T., Wu, Q., Li, Z., He, G., Wu, J., … Xie, G. (2022). Modeling and Optimizing the Scaling Performance in Distributed Deep Learning Training. In WWW 2022 - Proceedings of the ACM Web Conference 2022 (pp. 1764–1773). Association for Computing Machinery, Inc. https://doi.org/10.1145/3485447.3511981

Modeling and Optimizing the Scaling Performance in Distributed Deep Learning Training

Abstract

Author supplied keywords

Cite

Register to see more suggestions