A-BIRCH: Automatic threshold estimation for the BIRCH clustering algorithm

Boris Lorbeer; Ana Kosareva; Bersant Deva; Dženan Softić; Peter Ruppel; Axel Küpper

Conference Proceedings

A-BIRCH: Automatic threshold estimation for the BIRCH clustering algorithm

Advances in Intelligent Systems and Computing (2017) 529 169-178

DOI: 10.1007/978-3-319-47898-2_18

15Citations

22Readers

Get full text

Abstract

Clustering algorithms are recently regaining attention with the availability of large datasets and the rise of parallelized computing architectures. However, most clustering algorithms do not scale well with increasing dataset sizes and require proper parametrization for correct results. In this paper we present A-BIRCH, an approach for automatic threshold estimation for the BIRCH clustering algorithm using Gap Statistic. This approach renders the global clustering step of BIRCH unnecessary and does not require knowledge on the expected number of clusters beforehand. This is achieved by analyzing a small representative subset of the data to extract attributes such as the cluster radius and the minimal cluster distance. These attributes are then used to compute a threshold that results, with high probability, in the correct clustering of elements. For the analysis of the representative subset we parallelized Gap Statistic to improve performance and ensure scalability.

Cite

CITATION STYLE

APA

Lorbeer, B., Kosareva, A., Deva, B., Softić, D., Ruppel, P., & Küpper, A. (2017). A-BIRCH: Automatic threshold estimation for the BIRCH clustering algorithm. In Advances in Intelligent Systems and Computing (Vol. 529, pp. 169–178). Springer Verlag. https://doi.org/10.1007/978-3-319-47898-2_18

A-BIRCH: Automatic threshold estimation for the BIRCH clustering algorithm

Abstract

Cite

Register to see more suggestions