Distributed training performance is constrained by two factors. One is the communication overhead between parameter servers and workers. The other is the unbalanced computing powers across workers. We propose a dynamic delay based cyclic gradient update method, which allows workers to push gradients to parameter servers in a round-robin order with dynamic delays. Stale gradient information is accumulated locally in each worker. When a worker obtains the token to update gradients, the accumulated gradients are pushed to parameter servers. Experiments show that, compared with the previous synchronous and cyclic gradient update methods, the dynamic delay cyclic method converges to the same accuracy at a faster speed.
CITATION STYLE
Hu, W., Wang, P., Wang, Q., Zhou, Z., Xiang, H., Li, M., & Shi, Z. (2018). Dynamic delay based cyclic gradient update method for distributed training. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11258 LNCS, pp. 550–559). Springer Verlag. https://doi.org/10.1007/978-3-030-03338-5_46
Mendeley helps you to discover research relevant for your work.