Hypothesis testing for automated community detection in networks

Peter J. Bickel; Purnamrita Sarkar

Journal ArticleOPEN ACCESS

Hypothesis testing for automated community detection in networks

Journal of the Royal Statistical Society. Series B: Statistical Methodology (2016) 78(1) 253-273

DOI: 10.1111/rssb.12117

135Citations

69Readers

Get full text

Abstract

Community detection in networks is a key exploratory tool with applications in a diverse set of areas, ranging from finding communities in social and biological networks to identifying link farms in the World Wide Web. The problem of finding communities or clusters in a network has received much attention from statistics, physics and computer science. However, most clustering algorithms assume knowledge of the number of clusters k. We propose to determine k automatically in a graph generated from a stochastic block model by using a hypothesis test of independent interest. Our main contribution is twofold; first, we theoretically establish the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and use that distribution for our test of the hypothesis that a random graph is of Erdős–Rényi (noise) type. Secondly, we use this test to design a recursive bipartitioning algorithm, which naturally uncovers nested community structure. Using simulations and quantifiable classification tasks on real world networks with ground truth, we show that our algorithm outperforms state of the art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Bickel, P. J., & Sarkar, P. (2016). Hypothesis testing for automated community detection in networks. Journal of the Royal Statistical Society. Series B: Statistical Methodology, 78(1), 253–273. https://doi.org/10.1111/rssb.12117

Hypothesis testing for automated community detection in networks

Abstract

Author supplied keywords

Cite

Register to see more suggestions