A semi-clustering scheme for high performance PageRank on Hadoop

Seungtae Hong; Jeonghoon Lee; Jaewoo Chang; Dong Hoon Choi

Conference Proceedings

A semi-clustering scheme for high performance PageRank on Hadoop

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8823 35-44

DOI: 10.1007/978-3-319-12256-4_4

1Citations

6Readers

Get full text

Abstract

As global Internet business has been evolving, large-scale graphs are becoming popular. PageRank computation on the large-scale graphs using Hadoop with default data partitioning method suffers from poor performance because Hadoop scatters even a set of directly connected vertices to arbitrary multiple nodes. In this paper we propose a semi-clustering scheme to address this problem and improve the performance of PageRank on Hadoop. Our scheme divides a graph into a set of semi-clusters, each of which consists of connected vertices, and assigns a semi-cluster to a single data partition in order to reduce the cost of data exchange between nodes during the computation of PageRank. The semi-clusters are merged and split before the PageRank computation, in order to evenly distribute a large-scale graph into a number of data partitions. Our semi-clustering scheme drastically improves the performance: total elapsed time including the cost of the semi-clustering computation reduced by up to 36%. Furthermore, the effectiveness of our scheme increases as the size of the graph increases.

Author supplied keywords

Cite

CITATION STYLE

APA

Hong, S., Lee, J., Chang, J., & Choi, D. H. (2014). A semi-clustering scheme for high performance PageRank on Hadoop. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8823, pp. 35–44). Springer Verlag. https://doi.org/10.1007/978-3-319-12256-4_4

A semi-clustering scheme for high performance PageRank on Hadoop

Abstract

Author supplied keywords

Cite

Register to see more suggestions