A semi-clustering scheme for high performance PageRank on Hadoop

1Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As global Internet business has been evolving, large-scale graphs are becoming popular. PageRank computation on the large-scale graphs using Hadoop with default data partitioning method suffers from poor performance because Hadoop scatters even a set of directly connected vertices to arbitrary multiple nodes. In this paper we propose a semi-clustering scheme to address this problem and improve the performance of PageRank on Hadoop. Our scheme divides a graph into a set of semi-clusters, each of which consists of connected vertices, and assigns a semi-cluster to a single data partition in order to reduce the cost of data exchange between nodes during the computation of PageRank. The semi-clusters are merged and split before the PageRank computation, in order to evenly distribute a large-scale graph into a number of data partitions. Our semi-clustering scheme drastically improves the performance: total elapsed time including the cost of the semi-clustering computation reduced by up to 36%. Furthermore, the effectiveness of our scheme increases as the size of the graph increases.

Cite

CITATION STYLE

APA

Hong, S., Lee, J., Chang, J., & Choi, D. H. (2014). A semi-clustering scheme for high performance PageRank on Hadoop. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8823, pp. 35–44). Springer Verlag. https://doi.org/10.1007/978-3-319-12256-4_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free