Abstract
HDBSCAN*, a state-of-The-Art density-based hierarchical clustering method, produces a hierarchical organization of clusters in a dataset w.r.t. a parameter mptsmpts. While a small change in mptsmpts typically leads to a small change in the clustering structure, choosing a 'good' mptsmpts value can be challenging: depending on the data distribution, a high or low mptsmpts value may be more appropriate, and certain clusters may reveal themselves at different values. To explore results for a range of mptsmpts values, one has to run HDBSCAN∗ for each value independently, which can be computationally impractical. In this paper, we propose an approach to efficiently compute all HDBSCAN∗ hierarchies for a range of mptsmpts values by building upon results from computational geometry to replace HDBSCAN*'s complete graph with a smaller equivalent graph. An experimental evaluation shows that our approach can obtain over one hundred hierarchies for the computational cost equivalent to running HDBSCAN∗ about twice, which corresponds to a speedup of more than 60 times, compared to running HDBSCAN∗ independently that many times. We also propose a series of visualizations that allow users to analyze a collection of hierarchies for a range of mptsmpts values, along with case studies that illustrate how these analyses are performed.
Author supplied keywords
Cite
CITATION STYLE
Neto, A. C. A., Sander, J., Campello, R. J. G. B., & Nascimento, M. A. (2021). Efficient Computation and Visualization of Multiple Density-Based Clustering Hierarchies. IEEE Transactions on Knowledge and Data Engineering, 33(8), 3075–3089. https://doi.org/10.1109/TKDE.2019.2962412
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.