DAPPLE: A pipelined data parallel approach for training large models

Shiqing Fan; Yi Rong; Chen Meng; Zongyan Cao; Siyu Wang; Zhen Zheng; Chuan Wu; Guoping Long; Jun Yang; Lixue Xia; Lansong Diao; Xiaoyong Liu; Wei Lin

Conference ProceedingsOPEN ACCESS

DAPPLE: A pipelined data parallel approach for training large models

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (2021) 431-445

DOI: 10.1145/3437801.3441593

133Citations

92Readers

Get full text

Abstract

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However, there are still several tricky issues to address: improving computing efficiency while ensuring convergence, and reducing memory usage without incurring additional computing costs. We propose DAPPLE, a synchronous training framework which combines data parallelism and pipeline parallelism for large DNN models. It features a novel parallelization strategy planner to solve the partition and placement problems, and explores the optimal hybrid strategies of data and pipeline parallelism. We also propose a new runtime scheduling algorithm to reduce device memory usage, which is orthogonal to re-computation approach and does not come at the expense of training throughput. Experiments show that DAPPLE planner consistently outperforms strategies generated by PipeDream's planner by up to 3.23× speedup under synchronous training scenarios, and DAPPLE runtime outperforms GPipe by 1.6× speedup of training throughput and saves 12% of memory consumption at the same time.

Author supplied keywords

Cite

CITATION STYLE

APA

Fan, S., Rong, Y., Meng, C., Cao, Z., Wang, S., Zheng, Z., … Lin, W. (2021). DAPPLE: A pipelined data parallel approach for training large models. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (pp. 431–445). Association for Computing Machinery. https://doi.org/10.1145/3437801.3441593

DAPPLE: A pipelined data parallel approach for training large models

Abstract

Author supplied keywords

Cite

Register to see more suggestions