A-BIRCH: Automatic threshold estimation for the BIRCH clustering algorithm

15Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Clustering algorithms are recently regaining attention with the availability of large datasets and the rise of parallelized computing architectures. However, most clustering algorithms do not scale well with increasing dataset sizes and require proper parametrization for correct results. In this paper we present A-BIRCH, an approach for automatic threshold estimation for the BIRCH clustering algorithm using Gap Statistic. This approach renders the global clustering step of BIRCH unnecessary and does not require knowledge on the expected number of clusters beforehand. This is achieved by analyzing a small representative subset of the data to extract attributes such as the cluster radius and the minimal cluster distance. These attributes are then used to compute a threshold that results, with high probability, in the correct clustering of elements. For the analysis of the representative subset we parallelized Gap Statistic to improve performance and ensure scalability.

Cite

CITATION STYLE

APA

Lorbeer, B., Kosareva, A., Deva, B., Softić, D., Ruppel, P., & Küpper, A. (2017). A-BIRCH: Automatic threshold estimation for the BIRCH clustering algorithm. In Advances in Intelligent Systems and Computing (Vol. 529, pp. 169–178). Springer Verlag. https://doi.org/10.1007/978-3-319-47898-2_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free