Parallel K-means clustering algorithm on DNA dataset

Fazilah Othman; Rosni Abdullah; Nur'Aini Abdul Rashid; Rosalina Abdul Salam

Conference Proceedings

Parallel K-means clustering algorithm on DNA dataset

Lecture Notes in Computer Science (2004) 3320 248-251

DOI: 10.1007/978-3-540-30501-9_54

15Citations

35Readers

Get full text

Abstract

Clustering is a division of data into groups of similar objects. K-means has been used in many clustering work because of the ease of the algorithm. Our main effort is to parallelize the k-means clustering algorithm. The parallel version is implemented based on the inherent parallelism during the Distance Calculation and Centroid Update phases. The parallel K-means algorithm is designed in such a way that each P participating node is responsible for handling n/P data points. We run the program on a Linux Cluster with a maximum of eight nodes using message-passing programming model. We examined the performance based on the percentage of correct answers and its speed-up performance. The outcome shows that our parallel K-means program performs relatively well on large datasets.

Cite

CITATION STYLE

APA

Othman, F., Abdullah, R., Rashid, N. A., & Salam, R. A. (2004). Parallel K-means clustering algorithm on DNA dataset. In Lecture Notes in Computer Science (Vol. 3320, pp. 248–251). Springer Verlag. https://doi.org/10.1007/978-3-540-30501-9_54

Parallel K-means clustering algorithm on DNA dataset

Abstract

Cite

Register to see more suggestions