Distributed Clustering of Text Collections

Juan Zamora; Héctor Allende-Cid; Marcelo Mendoza

Journal ArticleOPEN ACCESS

Distributed Clustering of Text Collections

IEEE Access (2019) 7 155671-155685

DOI: 10.1109/ACCESS.2019.2949455

3Citations

6Readers

Abstract

Current data processing tasks require efficient approaches capable of dealing with large databases. A promising strategy consists in distributing the data along with several computers that partially solve the undertaken problem. Finally, these partial answers are integrated to obtain a final solution. We introduce distributed shared nearest neighbors (D-SNN), a novel clustering algorithm that work with disjoint partitions of data. Our algorithm produces a global clustering solution that achieves a competitive performance regarding centralized approaches. The algorithm works effectively with high dimensional data, being advisable for document clustering tasks. Experimental results over five data sets show that our proposal is competitive in terms of quality performance measures when compared to state of the art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Zamora, J., Allende-Cid, H., & Mendoza, M. (2019). Distributed Clustering of Text Collections. IEEE Access, 7, 155671–155685. https://doi.org/10.1109/ACCESS.2019.2949455

Distributed Clustering of Text Collections

Abstract

Author supplied keywords

Cite

Register to see more suggestions