The aim of this work is to produce fast, easy-to-apply but effective algorithms for clustering large text collections. In this paper, we propose a novel concept of similarity measure among objects and its related clustering algorithms. The similarity between two objects within a cluster is measured from the view of all other objects outside that cluster. As a result, two optimality criteria are formulated as the objective functions for the clustering problem. We analyze and compare the proposed clustering approaches with the popular algorithms for document clustering in the literature. Extensive empirical experiments are carried out on various benchmark datasets and evaluated by different metrics. The results show that our proposed criterion functions consistently outperform the other well-known clustering criteria, and give the best overall performance with the same computational efficiency. © 2010 Springer-Verlag.
CITATION STYLE
Nguyen, D. T., Chen, L., & Chan, C. K. (2010). Multi-viewpoint based similarity measure and optimality criteria for document clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6458 LNCS, pp. 49–60). https://doi.org/10.1007/978-3-642-17187-1_5
Mendeley helps you to discover research relevant for your work.