Optimized distributed text document clustering algorithm

J. E. Judith; J. Jayakumari

Conference Proceedings

Optimized distributed text document clustering algorithm

Advances in Intelligent Systems and Computing (2015) 325 565-574

DOI: 10.1007/978-81-322-2135-7_60

1Citations

3Readers

Get full text

Abstract

Due to scientific progression, a variety of challenges exist in the field of information retrieval (IR). These challenges are due to the increased usage of large volumes of data. These enormous amounts of data are available from large-scale distributed networks. Centralization of these data to perform analysis is difficult. There exists a need for distributed text document clustering algorithms that overcomes challenges in clustering. The two main challenges are clustering accuracy and clustering quality. In this paper, an optimized distributed text document clustering algorithm is proposed that uses a distributed particle swarm optimization (DPSO) algorithm for the purpose of optimizing and generating initial centroids for the distributed K-means (DKMeans) clustering algorithm. This improves the quality of clustering. Similarity is determined using Jaccard coefficient that generates coherent clusters, thus improving the accuracy of the proposed algorithm. Extensive evaluations based on simulation are carried out with the given data sets to demonstrate the effectiveness of the algorithm. Data sets such as Reuters-21578 and 20 Newsgroups are used for evaluation.

Author supplied keywords

Cite

CITATION STYLE

APA

Judith, J. E., & Jayakumari, J. (2015). Optimized distributed text document clustering algorithm. In Advances in Intelligent Systems and Computing (Vol. 325, pp. 565–574). Springer Verlag. https://doi.org/10.1007/978-81-322-2135-7_60

Optimized distributed text document clustering algorithm

Abstract

Author supplied keywords

Cite

Register to see more suggestions