Scan statistics is a popular approach used for detecting "hotspots" and "anomalies" in spatio-temporal and network data. This methodology involves maximizing a score function over all connected subgraphs, which is NP-hard in general. A number of heuristics have been proposed for these problems, but they do not provide any quality guarantees. In this paper, we develop a framework for designing algorithms for optimizing a large class of scan statistics for networks, subject to connectivity constraints. Our algorithms run in time that scales linearly on the size of the graph and depends on a parameter we call the "effective solution size", while providing rigorous approximation guarantees. In contrast, most prior methods have super-linear running times in terms of graph size. Extensive empirical evidence demonstrates the effectiveness and efficiency of our proposed algorithms in comparison with state-of-the-art methods. Our approach improves on the performance relative to all prior methods, giving up to over 25% increase in the score. Further, our algorithms scale to networks with up to a million nodes, which is 1-2 orders of magnitude larger than all prior applications.
CITATION STYLE
Cadena, J., Chen, F., & Vullikanti, A. (2017). Near-optimal and practical algorithms for graph scan statistics. In Proceedings of the 17th SIAM International Conference on Data Mining, SDM 2017 (pp. 624–632). Society for Industrial and Applied Mathematics Publications. https://doi.org/10.1137/1.9781611974973.70
Mendeley helps you to discover research relevant for your work.