An approach to silhouette and dunn clustering indices applied to big data in spark

13Citations
Citations of this article
29Readers
Mendeley users who have this article in their library.
Get full text

Abstract

K-Means and Bisecting K-Means clustering algorithms need the optimal number into which the dataset may be divided. Spark implementations of these algorithms include a method that is used to calculate this number. Unfortunately, this measurement presents a lack of precision because it only takes into account a sum of intra-cluster distancesmisleading the results. Moreover, this measurement has not been well-contrasted in previous researches about clustering indices. Therefore, we introduce a new Spark implementation of Silhouette and Dunn indices. These clustering indices have been tested in previous works. The results obtained show the potential of Silhouette and Dunn to deal with Big Data.

Cite

CITATION STYLE

APA

Luna-Romera, J. M., Martínez-Ballesteros, M. D. M., García-Gutiérrez, J., & Riquelme-Santos, J. C. (2016). An approach to silhouette and dunn clustering indices applied to big data in spark. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9868 LNAI, pp. 160–169). Springer Verlag. https://doi.org/10.1007/978-3-319-44636-3_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free