This paper proposes Khuzdul, a distributed execution engine with a well-defined abstraction that can be integrated with existing single-machine graph pattern mining (GPM) systems to provide efficiency and scalability at the same time. The key novelty is the extendable embedding abstraction which can express pattern enumeration algorithms, allow fine-grained task scheduling, and enable low-cost GPM-specific data reuse to reduce communication cost. The effective BFS-DFS hybrid exploration generates sufficient concurrent tasks for communication-computation overlapping with bounded memory consumption. Two scalable distributed GPM systems are implemented by porting Automine and GraphPi on Khuzdul. Our evaluation shows that Khuzdul based systems significantly outperform state-of-the-art distributed GPM systems with partitioned graphs by up to 75.5× (on average 19.0×), achieve similar or even better performance compared with the fastest distributed GPM systems with replicated graph, and scale to massive graphs with more than one hundred billion edges with a commodity cluster.
CITATION STYLE
Chen, J., & Qian, X. (2023). Khuzdul: Efficient and Scalable Distributed Graph Pattern Mining Engine. In International Conference on Architectural Support for Programming Languages and Operating Systems - ASPLOS (Vol. 2, pp. 413–426). Association for Computing Machinery. https://doi.org/10.1145/3575693.3575743
Mendeley helps you to discover research relevant for your work.