A convergence analysis of distributed SGD with communication-efficient gradient sparsification

Shaohuai Shi; Kaiyong Zhao; Qiang Wang; Zhenheng Tang; Xiaowen Chu

Conference ProceedingsOPEN ACCESS

A convergence analysis of distributed SGD with communication-efficient gradient sparsification

IJCAI International Joint Conference on Artificial Intelligence (2019) 2019-August 3411-3417

DOI: 10.24963/ijcai.2019/473

64Citations

43Readers

Abstract

Gradient sparsification is a promising technique to significantly reduce the communication overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms. Yet, many existing gradient sparsification schemes (e.g., Top-k sparsification) have a communication complexity of O(kP ), where k is the number of selected gradients by each worker and P is the number of workers. Recently, the gTop-k sparsification scheme has been proposed to reduce the communication complexity from O(kP ) to O(k log P ), which significantly boosts the system scalability. However, it remains unclear whether the gTop-k sparsification scheme can converge in theory. In this paper, we first provide theoretical proofs on the convergence of the gTop-k scheme for non-convex objective functions under certain analytic assumptions. We then derive the convergence rate of gTop-k S-SGD, which is at the same order as the vanilla minibatch SGD. Finally, we conduct extensive experiments on different machine learning models and data sets to verify the soundness of the assumptions and theoretical results, and discuss the impact of the compression ratio on the convergence performance.

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Shi, S., Zhao, K., Wang, Q., Tang, Z., & Chu, X. (2019). A convergence analysis of distributed SGD with communication-efficient gradient sparsification. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 3411–3417). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/473

Readers' Seniority

PhD / Post grad / Masters / Doc 22

92%

Researcher 2

Readers' Discipline

Computer Science 19

76%

Engineering 4

16%

Agricultural and Biological Sciences 1

Medicine and Dentistry 1

A convergence analysis of distributed SGD with communication-efficient gradient sparsification

Abstract

References Powered by Scopus

Deep residual learning for image recognition

Optimization methods for large-scale machine learning

Sparse communication for distributed gradient descent

Cited by Powered by Scopus

Model Pruning Enables Efficient Federated Learning on Edge Devices

Adaptive gradient sparsification for efficient federated learning: An online learning approach

Joint Model Pruning and Device Selection for Communication-Efficient Federated Edge Learning

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline