Spectral analysis for billion-scale graphs: Discoveries and implementation

49Citations
Citations of this article
86Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Given a graph with billions of nodes and edges, how can we find patterns and anomalies? Are there nodes that participate in too many or too few triangles? Are there close-knit near-cliques? These questions are expensive to answer unless we have the first several eigenvalues and eigenvectors of the graph adjacency matrix. However, eigensolvers suffer from subtle problems (e.g., convergence) for large sparse matrices, let alone for billion-scale ones. We address this problem with the proposed HEigen algorithm, which we carefully design to be accurate, efficient, and able to run on the highly scalable MapReduce(Hadoop) environment. This enables HEigen to handle matrices more than 1000× larger than those which can be analyzed by existing algorithms. We implement HEigen and run it on the M45 cluster, one of the top 50 supercomputers in the world. We report important discoveries about near-cliques and triangles on several real-world graphs, including a snapshot of the Twitter social network (38Gb, 2 billion edges) and the "YahooWeb" dataset, one of the largest publicly available graphs (120Gb, 1.4 billion nodes, 6.6 billion edges). © 2011 Springer-Verlag.

Cite

CITATION STYLE

APA

Kang, U., Meeder, B., & Faloutsos, C. (2011). Spectral analysis for billion-scale graphs: Discoveries and implementation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6635 LNAI, pp. 13–25). Springer Verlag. https://doi.org/10.1007/978-3-642-20847-8_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free