A convergence analysis of distributed SGD with communication-efficient gradient sparsification

64Citations
Citations of this article
43Readers
Mendeley users who have this article in their library.

Abstract

Gradient sparsification is a promising technique to significantly reduce the communication overhead in decentralized synchronous stochastic gradient descent (S-SGD) algorithms. Yet, many existing gradient sparsification schemes (e.g., Top-k sparsification) have a communication complexity of O(kP ), where k is the number of selected gradients by each worker and P is the number of workers. Recently, the gTop-k sparsification scheme has been proposed to reduce the communication complexity from O(kP ) to O(k log P ), which significantly boosts the system scalability. However, it remains unclear whether the gTop-k sparsification scheme can converge in theory. In this paper, we first provide theoretical proofs on the convergence of the gTop-k scheme for non-convex objective functions under certain analytic assumptions. We then derive the convergence rate of gTop-k S-SGD, which is at the same order as the vanilla minibatch SGD. Finally, we conduct extensive experiments on different machine learning models and data sets to verify the soundness of the assumptions and theoretical results, and discuss the impact of the compression ratio on the convergence performance.

References Powered by Scopus

Deep residual learning for image recognition

174356Citations
N/AReaders
Get full text

Optimization methods for large-scale machine learning

1989Citations
N/AReaders
Get full text

Sparse communication for distributed gradient descent

354Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Model Pruning Enables Efficient Federated Learning on Edge Devices

208Citations
N/AReaders
Get full text

Adaptive gradient sparsification for efficient federated learning: An online learning approach

128Citations
N/AReaders
Get full text

Joint Model Pruning and Device Selection for Communication-Efficient Federated Edge Learning

69Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Shi, S., Zhao, K., Wang, Q., Tang, Z., & Chu, X. (2019). A convergence analysis of distributed SGD with communication-efficient gradient sparsification. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 3411–3417). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/473

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 22

92%

Researcher 2

8%

Readers' Discipline

Tooltip

Computer Science 19

76%

Engineering 4

16%

Agricultural and Biological Sciences 1

4%

Medicine and Dentistry 1

4%

Save time finding and organizing research with Mendeley

Sign up for free