Clustering is a division of data into groups of similar objects. K-means has been used in many clustering work because of the ease of the algorithm. Our main effort is to parallelize the k-means clustering algorithm. The parallel version is implemented based on the inherent parallelism during the Distance Calculation and Centroid Update phases. The parallel K-means algorithm is designed in such a way that each P participating node is responsible for handling n/P data points. We run the program on a Linux Cluster with a maximum of eight nodes using message-passing programming model. We examined the performance based on the percentage of correct answers and its speed-up performance. The outcome shows that our parallel K-means program performs relatively well on large datasets.
CITATION STYLE
Othman, F., Abdullah, R., Rashid, N. A., & Salam, R. A. (2004). Parallel K-means clustering algorithm on DNA dataset. In Lecture Notes in Computer Science (Vol. 3320, pp. 248–251). Springer Verlag. https://doi.org/10.1007/978-3-540-30501-9_54
Mendeley helps you to discover research relevant for your work.