An approach to silhouette and dunn clustering indices applied to big data in spark

José María Luna-Romera; María Del Mar Martínez-Ballesteros; Jorge García-Gutiérrez; José C. Riquelme-Santos

Conference Proceedings

An approach to silhouette and dunn clustering indices applied to big data in spark

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9868 LNAI 160-169

DOI: 10.1007/978-3-319-44636-3_15

13Citations

29Readers

Get full text

Abstract

K-Means and Bisecting K-Means clustering algorithms need the optimal number into which the dataset may be divided. Spark implementations of these algorithms include a method that is used to calculate this number. Unfortunately, this measurement presents a lack of precision because it only takes into account a sum of intra-cluster distancesmisleading the results. Moreover, this measurement has not been well-contrasted in previous researches about clustering indices. Therefore, we introduce a new Spark implementation of Silhouette and Dunn indices. These clustering indices have been tested in previous works. The results obtained show the potential of Silhouette and Dunn to deal with Big Data.

Author supplied keywords

Cite

CITATION STYLE

APA

Luna-Romera, J. M., Martínez-Ballesteros, M. D. M., García-Gutiérrez, J., & Riquelme-Santos, J. C. (2016). An approach to silhouette and dunn clustering indices applied to big data in spark. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9868 LNAI, pp. 160–169). Springer Verlag. https://doi.org/10.1007/978-3-319-44636-3_15

An approach to silhouette and dunn clustering indices applied to big data in spark

Abstract

Author supplied keywords

Cite

Register to see more suggestions