Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning

Shuangyan Yang; Minjia Zhang; Wenqian Dong; Dong Li

Conference ProceedingsOPEN ACCESS

Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning

International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (2023) 2 103-117

DOI: 10.1145/3575693.3575725

17Citations

19Readers

Abstract

The Graph Neural Network (GNN) is showing outstanding results in improving the performance of graph-based applications. Recent studies demonstrate that GNN performance can be boosted via using more advanced aggregators, deeper aggregation depth, larger sampling rate, etc. While leading to promising results, the improvements come at a cost of significantly increased memory footprint, easily exceeding GPU memory capacity. In this paper, we introduce a method, Betty, to make GNN training more scalable and accessible via batch-level partitioning. Different from DNN training, a mini-batch in GNN has complex dependencies between input features and output labels, making batch-level partitioning difficult. Betty introduces two noveltechniques, redundancy-embedded graph (REG) partitioning and memory-aware partitioning, to effectively mitigate the redundancy and load imbalances issues across the partitions. Our evaluation of large-scale real-world datasets shows that Betty can significantly mitigate the memory bottleneck, enabling scalable GNN training with much deeper aggregation depths, larger sampling rate, larger training batch sizes, together with more advanced aggregators, with a few as a single GPU.

Author supplied keywords

Cite

CITATION STYLE

APA

Yang, S., Zhang, M., Dong, W., & Li, D. (2023). Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning. In International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (Vol. 2, pp. 103–117). Association for Computing Machinery. https://doi.org/10.1145/3575693.3575725

Betty: Enabling Large-Scale GNN Training with Batch-Level Graph Partitioning

Abstract

Author supplied keywords

Cite

Register to see more suggestions