Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark

V. Santhi; Rini Jose

Conference Proceedings

Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10722 LNCS 158-162

DOI: 10.1007/978-3-319-72344-0_12

7Citations

10Readers

Get full text

Abstract

Clustering divides data into meaningful, useful groups known as clusters without any prior knowledge about the data. One of the drawbacks of K-Means clustering is the estimation of initial centroids which influence the performance of the algorithm. To overcome this issue, optimization algorithms like Bat and Firefly are executed as pre-processing step. These algorithms return optimal centroids which is given as input to the K-Means algorithm. Clustering is carried out on large data sets, therefore Apache Spark, an open source software framework is used. The performance of the optimization algorithms is evaluated and the best algorithm is determined.

Author supplied keywords

Cite

CITATION STYLE

APA

Santhi, V., & Jose, R. (2018). Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10722 LNCS, pp. 158–162). Springer Verlag. https://doi.org/10.1007/978-3-319-72344-0_12

Performance Analysis of Parallel K-Means with Optimization Algorithms for Clustering on Spark

Abstract

Author supplied keywords

Cite

Register to see more suggestions