NeutronStar: Distributed GNN Training with Hybrid Dependency Management

47Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

GNN's training needs to resolve issues of vertex dependencies, i.e., each vertex representation's update depends on its neighbors. Existing distributed GNN systems adopt either a dependencies-cached approach or a dependencies-communicated approach. Having made intensive experiments and analysis, we find that a decision to choose one or the other approach for the best performance is determined by a set of factors, including graph inputs, model configurations, and an underlying computing cluster environment. If various GNN trainings are supported solely by one approach, the performance results are often suboptimal. We study related factors for each GNN training before its execution to choose the best-fit approach accordingly. We propose a hybrid dependency-handling approach that adaptively takes the merits of the two approaches at runtime. Based on the hybrid approach, we further develop a distributed GNN training system called NeutronStar, which makes high performance GNN trainings in an automatic way. NeutronStar is also empowered by effective optimizations in CPU-GPU computation and data processing. Our experimental results on 16-node Aliyun cluster demonstrate that NeutronStar achieves 1.81X-14.25X speedup over existing GNN systems including DistDGL and ROC.

Cite

CITATION STYLE

APA

Wang, Q., Zhang, Y., Wang, H., Chen, C., Zhang, X., & Yu, G. (2022). NeutronStar: Distributed GNN Training with Hybrid Dependency Management. In Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 1301–1315). Association for Computing Machinery. https://doi.org/10.1145/3514221.3526134

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free