Due to scientific progression, a variety of challenges exist in the field of information retrieval (IR). These challenges are due to the increased usage of large volumes of data. These enormous amounts of data are available from large-scale distributed networks. Centralization of these data to perform analysis is difficult. There exists a need for distributed text document clustering algorithms that overcomes challenges in clustering. The two main challenges are clustering accuracy and clustering quality. In this paper, an optimized distributed text document clustering algorithm is proposed that uses a distributed particle swarm optimization (DPSO) algorithm for the purpose of optimizing and generating initial centroids for the distributed K-means (DKMeans) clustering algorithm. This improves the quality of clustering. Similarity is determined using Jaccard coefficient that generates coherent clusters, thus improving the accuracy of the proposed algorithm. Extensive evaluations based on simulation are carried out with the given data sets to demonstrate the effectiveness of the algorithm. Data sets such as Reuters-21578 and 20 Newsgroups are used for evaluation.
CITATION STYLE
Judith, J. E., & Jayakumari, J. (2015). Optimized distributed text document clustering algorithm. In Advances in Intelligent Systems and Computing (Vol. 325, pp. 565–574). Springer Verlag. https://doi.org/10.1007/978-81-322-2135-7_60
Mendeley helps you to discover research relevant for your work.