Efficient Triangle Counting In Large Graphs Via Degree-Based Vertex Partitioning

Mihail N. Kolountzakis; Gary L. Miller; Richard Peng; Charalampos E. Tsourakakis

Journal ArticleOPEN ACCESS

Efficient Triangle Counting In Large Graphs Via Degree-Based Vertex Partitioning

Internet Mathematics (2012) 8(1-2) 161-185

DOI: 10.1080/15427951.2012.625260

86Citations

52Readers

Abstract

The number of triangles is a computationally expensive graph statistic frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model), and in important real-world applications such as spam detection, uncovering the hidden thematic structures in the Web, and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms that run fast, use little space, provide accurate estimates of the number of triangles, and preferably are parallelizable. In this paper we present an efficient triangle-counting approximation algorithm that can be adapted to the semistreaming model [Feigenbaum et al. 05]. Its key idea is to combine the sampling algorithm of [Tsourakakis et al. 09, Tsourakakis et al. 11] and the partitioning of the set of vertices into high- and low-degree subsets as in [Alon et al. 97], treating each set appropriately. From a mathematical perspective, we present a simplified proof of [Tsourakakis et al. 11] that uses the powerful Kim–Vu concentration inequality [Kim and Vu 00] based on the Hajnal–Szemer´edi theorem [Hajnal and Szemer´edi 70]. Furthermore, we improve bounds of existing triple-sampling techniques based on a theorem of [Ahlswede and Katona 78]. We obtain a running time (Formula presented.) and an (Formula presented.) approximation, where n is the number of vertices, m is the number of edges, and Δ is the maximum number of triangles in which any single edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage (Formula presented.) and a constant number of passes (three) over the graph stream. We apply our methods to various networks with several millions of edges and we obtain excellent results, outperforming existing triangle-counting methods. Finally, we propose a random-projection-based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.

Cite

CITATION STYLE

APA

Kolountzakis, M. N., Miller, G. L., Peng, R., & Tsourakakis, C. E. (2012). Efficient Triangle Counting In Large Graphs Via Degree-Based Vertex Partitioning. Internet Mathematics, 8(1–2), 161–185. https://doi.org/10.1080/15427951.2012.625260

Efficient Triangle Counting In Large Graphs Via Degree-Based Vertex Partitioning

Abstract

Cite

Register to see more suggestions