HDBSCAN*, a state-of-The-Art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mptsmpts. While a small change in mptsmpts typically leads to a small change in the clustering structure, choosing a 'good' mptsmpts value can be challenging: depending on the data distribution, a high or low mptsmpts value may be more appropriate, and certain clusters may reveal themselves at different values. To explore results for a range of mptsmpts values, one has to run HDBSCAN∗ for each value independently, which can be computationally impractical. In this paper, we propose an approach to efficiently compute all HDBSCAN∗ hierarchies for a range of mptsmpts values by building upon results from computational geometry to replace HDBSCAN*'s complete graph with a smaller equivalent graph. An experimental evaluation shows that our approach can obtain over one hundred hierarchies for the computational cost equivalent to running HDBSCAN∗ about twice, which corresponds to a speedup of more than 60 times, compared to running HDBSCAN∗ independently that many times. We also propose a series of visualizations that allow users to analyze a collection of hierarchies for a range of mptsmpts values, along with case studies that illustrate how these analyses are performed.
CITATION STYLE
Neto, A. C. A., Sander, J., Campello, R. J. G. B., & Nascimento, M. A. (2021). Efficient Computation and Visualization of Multiple Density-Based Clustering Hierarchies. IEEE Transactions on Knowledge and Data Engineering, 33(8), 3075–3089. https://doi.org/10.1109/TKDE.2019.2962412
Mendeley helps you to discover research relevant for your work.