Efficient sparse collective communication and its application to accelerate distributed deep learning

Jiawei Fei; Chen Yu Ho; Atal N. Sahu; Marco Canini; Amedeo Sapio

Conference ProceedingsOPEN ACCESS

Efficient sparse collective communication and its application to accelerate distributed deep learning

SIGCOMM 2021 - Proceedings of the ACM SIGCOMM 2021 Conference (2021) 676-691

DOI: 10.1145/3452296.3472904

127Citations

62Readers

Get full text

Abstract

Efficient collective communication is crucial to parallel-computing applications such as distributed training of large-scale recommendation systems and natural language processing models. Existing collective communication libraries focus on optimizing operations for dense inputs, resulting in transmissions of many zeros when inputs are sparse. This counters current trends that see increasing data sparsity in large models. We propose OmniReduce, an efficient streaming aggregation system that exploits sparsity to maximize effective bandwidth use by sending only non-zero data blocks. We demonstrate that this idea is beneficial and accelerates distributed training by up to 8.2x. Even at 100 Gbps, OmniReduce delivers 1.4 - 2.9x better performance for network-bottlenecked DNNs.

Author supplied keywords

Cite

CITATION STYLE

APA

Fei, J., Ho, C. Y., Sahu, A. N., Canini, M., & Sapio, A. (2021). Efficient sparse collective communication and its application to accelerate distributed deep learning. In SIGCOMM 2021 - Proceedings of the ACM SIGCOMM 2021 Conference (pp. 676–691). Association for Computing Machinery, Inc. https://doi.org/10.1145/3452296.3472904

Efficient sparse collective communication and its application to accelerate distributed deep learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions