Optimized distributed text document clustering algorithm

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Due to scientific progression, a variety of challenges exist in the field of information retrieval (IR). These challenges are due to the increased usage of large volumes of data. These enormous amounts of data are available from large-scale distributed networks. Centralization of these data to perform analysis is difficult. There exists a need for distributed text document clustering algorithms that overcomes challenges in clustering. The two main challenges are clustering accuracy and clustering quality. In this paper, an optimized distributed text document clustering algorithm is proposed that uses a distributed particle swarm optimization (DPSO) algorithm for the purpose of optimizing and generating initial centroids for the distributed K-means (DKMeans) clustering algorithm. This improves the quality of clustering. Similarity is determined using Jaccard coefficient that generates coherent clusters, thus improving the accuracy of the proposed algorithm. Extensive evaluations based on simulation are carried out with the given data sets to demonstrate the effectiveness of the algorithm. Data sets such as Reuters-21578 and 20 Newsgroups are used for evaluation.

Cite

CITATION STYLE

APA

Judith, J. E., & Jayakumari, J. (2015). Optimized distributed text document clustering algorithm. In Advances in Intelligent Systems and Computing (Vol. 325, pp. 565–574). Springer Verlag. https://doi.org/10.1007/978-81-322-2135-7_60

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free