Cluster summarization with dense region detection

Elnaz Bigdeli; Mahdi Mohammadi; Bijan Raahemi; Stan Matwin

Conference Proceedings

Cluster summarization with dense region detection

Communications in Computer and Information Science (2015) 553 68-83

DOI: 10.1007/978-3-319-25840-9_5

1Citations

4Readers

Get full text

Abstract

This paper introduces a new approach to summarize clusters by finding dense regions, and representing each cluster as a Gaussian Mixture Model (GMM). The GMM summarization allows us to summarize a cluster efficiently, then regenerate the original data with high accuracy. Unlike the classical representation of a cluster using a radius and a center, the proposed approach keeps information of the shape, as well as distributions of the samples in the clusters. Considering the GMM as a parametric model (number of Gaussian mixtures in each GMM), we propose a method to find number of Gaussian mixtures automatically. Each GMM is able to summarize a cluster generated by any kind of clustering algorithms and regenerate the original data with high accuracy. Moreover, when a new sample is presented to the GMMs of clusters, a membership value is calculated for each cluster. Then, using the membership values, the new incoming sample is assigned to the closest cluster. Employing the GMMs to summarize clusters offers several advantages with regards to accuracy, detection rate, memory efficiency and time complexity. We evaluate the proposed method on a variety of datasets, both synthetic dataset and real datasets from the UCI repository. We examine the quality of the summarized clusters generated by the proposed method in terms of DUNN, DB, SD and SSD indexes, and compare them with that of the well-known ABACUS method. We also employ the proposed algorithm in anomaly detection applications, and study the performance of the proposed method in terms of false alarm and detection rates, and compare them with Negative Selection, Naïve models, and ABACUS. Furthermore, we evaluate the memory usage and processing time of the proposed algorithms with other algorithms. The results illustrate that our algorithm outperforms other well-known anomaly detection algorithms in terms of accuracy, detection rate, as well as memory usage and processing time.

Cite

CITATION STYLE

APA

Bigdeli, E., Mohammadi, M., Raahemi, B., & Matwin, S. (2015). Cluster summarization with dense region detection. In Communications in Computer and Information Science (Vol. 553, pp. 68–83). Springer Verlag. https://doi.org/10.1007/978-3-319-25840-9_5

Cluster summarization with dense region detection

Abstract

Cite

Register to see more suggestions