GraphX: Graph processing in a distributed dataflow framework

Joseph E. Gonzalez; Reynold S. Xin; Ankur Dave; Daniel Crankshaw; Michael J. Franklin; Ion Stoica

Conference Proceedings

GraphX: Graph processing in a distributed dataflow framework

Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014 (2014) 599-613

1.0kCitations

464Readers

Abstract

In pursuit of graph processing performance, the systems community has largely abandoned general-purpose distributed dataflow frameworks in favor of specialized graph processing systems that provide tailored programming abstractions and accelerate the execution of iterative graph algorithms. In this paper we argue that many of the advantages of specialized graph processing systems can be recovered in a modern general-purpose distributed dataflow system. We introduce GraphX, an embedded graph processing framework built on top of Apache Spark, a widely used distributed dataflow system. GraphX presents a familiar composable graph abstraction that is sufficient to express existing graph APIs, yet can be implemented using only a few basic dataflow operators (e.g., join, map, group-by). To achieve performance parity with specialized graph systems, GraphX recasts graph-specific optimizations as distributed join optimizations and materialized view maintenance. By leveraging advances in distributed dataflow frameworks, GraphX brings low-cost fault tolerance to graph processing. We evaluate GraphX on real workloads and demonstrate that GraphX achieves an order of magnitude performance gain over the base dataflow framework and matches the performance of specialized graph processing systems while enabling a wider range of computation.

Cite

CITATION STYLE

APA

Gonzalez, J. E., Xin, R. S., Dave, A., Crankshaw, D., Franklin, M. J., & Stoica, I. (2014). GraphX: Graph processing in a distributed dataflow framework. In Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2014 (pp. 599–613). USENIX Association.

GraphX: Graph processing in a distributed dataflow framework

Abstract

Cite

Register to see more suggestions