Distributed SGD with Flexible Gradient Compression

Tran Thi Phuong; Le Trieu Phong

Journal ArticleOPEN ACCESS

Distributed SGD with Flexible Gradient Compression

IEEE Access (2020) 8 64707-64717

DOI: 10.1109/ACCESS.2020.2984633

17Citations

8Readers

Abstract

We design and evaluate a new algorithm called FlexCompressSGD for training deep neural networks over distributed datasets via multiple workers and a central server. In FlexCompressSGD, all gradients transmitted between workers and the server are compressed, and the workers are allowed to flexibly choose a compressing method different from that of the server. This flexibility significantly helps reduce the communication cost from each worker to the server. We mathematically prove that FlexCompressSGD converges with convergence rate /\sqrt {MT}$ where $M$ is the number of distributed workers and $T$ is the number of training iterations. We experimentally demonstrate that FlexCompressSGD obtains competitive top-1 testing accuracy on the ImageNet dataset while being able to reduce more than 70% communication cost from each worker to the server when compared with the state-of-the-art.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Phuong, T. T., & Phong, L. T. (2020). Distributed SGD with Flexible Gradient Compression. IEEE Access, 8, 64707–64717. https://doi.org/10.1109/ACCESS.2020.2984633

Readers' Seniority

PhD / Post grad / Masters / Doc 2

50%

Professor / Associate Prof. 1

25%

Researcher 1

25%

Readers' Discipline

Computer Science 2

50%

Environmental Science 1

25%

Mathematics 1

25%

Distributed SGD with Flexible Gradient Compression

Abstract

Author supplied keywords

References Powered by Scopus

ImageNet Large Scale Visual Recognition Challenge

The tail at scale

Cited by Powered by Scopus

Distributed signsGD with improved accuracy and network-fault tolerance

Decentralized Descent Optimization with Stochastic Gradient Signs for Device-to-Device Networks

Distributed differentially-private learning with communication efficiency

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline