Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models

7Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

Scaling models to large sizes to improve performance has led a trend in deep learning, and sparsely activated Mixture-of-Expert (MoE) is a promising architecture to scale models. However, training MoE models in existing systems is expensive, mainly due to the All-to-All communication between layers.All-to-All communication originates from expert-centric paradigm: keeping experts in-place and exchanging intermediate data to feed experts. We propose the novel data-centric paradigm: keeping data in-place and moving experts between GPUs. Since experts' size can be smaller than the size of data, data-centric paradigm can reduce communication workload. Based on this insight, we develop Janus. First, Janus supports fine-grained asynchronous communication, which can overlap computation and communication. Janus implements a hierarchical communication to further reduce cross-node traffic by sharing the fetched experts in the same machine. Second, when scheduling the "fetching expert"requests, Janus implements a topology-aware priority strategy to utilize intra-node and inter-node links efficiently. Finally, Janus allows experts to be prefetched, which allows the downstream computation to start immediately once the previous step completes.Evaluated on a 32-A100 cluster, Janus can reduce the traffic up to 16× and achieves up to 2.06× speedup compared with current MoE training system.

Cite

CITATION STYLE

APA

Liu, J., Wang, J. H., & Jiang, Y. (2023). Janus: A Unified Distributed Training Framework for Sparse Mixture-of-Experts Models. In SIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference (pp. 486–498). Association for Computing Machinery, Inc. https://doi.org/10.1145/3603269.3604869

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free