Previously, we showed that dividing 2D datasets into grid boxes could give satisfactory estimation of cluster count by detecting local maxima in data density relative to nearby grid boxes. The algorithm was robust for datasets with clusters of different sizes and distributions deviating from Gaussian distribution to a certain degree. Given the difficulty of estimating cluster count in higher dimensional datasets by visualization, the goal was to improve the method for higher dimensions, as well as the speed of the implementation. The improved algorithm yielded satisfactory results by looking at data density in a hypercube grid. This points towards possible approaches for addressing the curse of dimensionality. Also, a six-fold boost in average run speed of the implementation could be achieved by adopting a generalized version of quadratic binary search.
CITATION STYLE
Lo, C. C. wing, Chowdhury, J., Hollander, M., Ahmed, A. W., Sood, S., Sproul, K., & Hadgis, A. (2019). Facilitating Cluster Counting in Multi-dimensional Feature Space by Intermediate Information Grouping. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11580 LNAI, pp. 284–298). Springer Verlag. https://doi.org/10.1007/978-3-030-22419-6_20
Mendeley helps you to discover research relevant for your work.