Synthesizing optimal collective algorithms

Zixian Cai; Zhengyang Liu; Saeed Maleki; Madanlal Musuvathi; Todd Mytkowicz; Jacob Nelson; Olli Saarikivi

Conference ProceedingsOPEN ACCESS

Synthesizing optimal collective algorithms

Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (2021) 62-75

DOI: 10.1145/3437801.3441620

22Citations

47Readers

Get full text

Abstract

Collective communication algorithms are an important component of distributed computation. Indeed, in the case of deep-learning, collective communication is the Amdahl's bottleneck of data-parallel training. This paper introduces SCCL (for Synthesized Collective Communication Library), a systematic approach to synthesizing collective communication algorithms that are explicitly tailored to a particular hardware topology. SCCL synthesizes algorithms along the Pareto-frontier spanning from latency-optimal to bandwidth-optimal implementations of a collective. The paper demonstrates how to encode the synthesis problem as a quantifier-free SMT formula which can be discharged to a theorem prover. We show how our carefully built encoding enables SCCL to scale. We synthesize novel latency and bandwidth optimal algorithms not seen in the literature on two popular hardware topologies. We also show how SCCL efficiently lowers algorithms to implementations on two hardware architectures (NVIDIA and AMD) and demonstrate competitive performance with hand optimized collective communication libraries.

Author supplied keywords

Cite

CITATION STYLE

APA

Cai, Z., Liu, Z., Maleki, S., Musuvathi, M., Mytkowicz, T., Nelson, J., & Saarikivi, O. (2021). Synthesizing optimal collective algorithms. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP (pp. 62–75). Association for Computing Machinery. https://doi.org/10.1145/3437801.3441620

Synthesizing optimal collective algorithms

Abstract

Author supplied keywords

Cite

Register to see more suggestions