Large-scale network and graph analysis has received considerable attention recently. Graph mining techniques often involve an iterative algorithm, which can be implemented in a variety of ways. Using PageRank as a model problem, we look at three algorithm design axes: work activation, data access pattern, and scheduling. We investigate the impact of different algorithm design choices. Using these design axes, we design and test a variety of PageRank implementations finding that data-driven, push-based algorithms are able to achieve more than 28x the performance of standard PageRank implementations (e.g., those in GraphLab). The design choices affect both single-threaded performance as well as parallel scalability. The implementation lessons not only guide efficient implementations of many graph mining algorithms, but also provide a framework for designing new scalable algorithms.
CITATION STYLE
Whang, J. J., Lenharth, A., Dhillon, I. S., & Pingali, K. (2015). Scalable data-driven PageRank: Algorithms, system issues, and lessons learned. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9233, pp. 438–450). Springer Verlag. https://doi.org/10.1007/978-3-662-48096-0_34
Mendeley helps you to discover research relevant for your work.