Multi-viewpoint based similarity measure and optimality criteria for document clustering

Duc Thang Nguyen; Lihui Chen; Chee Keong Chan

Conference Proceedings

Multi-viewpoint based similarity measure and optimality criteria for document clustering

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6458 LNCS 49-60

DOI: 10.1007/978-3-642-17187-1_5

1Citations

2Readers

Get full text

Abstract

The aim of this work is to produce fast, easy-to-apply but effective algorithms for clustering large text collections. In this paper, we propose a novel concept of similarity measure among objects and its related clustering algorithms. The similarity between two objects within a cluster is measured from the view of all other objects outside that cluster. As a result, two optimality criteria are formulated as the objective functions for the clustering problem. We analyze and compare the proposed clustering approaches with the popular algorithms for document clustering in the literature. Extensive empirical experiments are carried out on various benchmark datasets and evaluated by different metrics. The results show that our proposed criterion functions consistently outperform the other well-known clustering criteria, and give the best overall performance with the same computational efficiency. © 2010 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Nguyen, D. T., Chen, L., & Chan, C. K. (2010). Multi-viewpoint based similarity measure and optimality criteria for document clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6458 LNCS, pp. 49–60). https://doi.org/10.1007/978-3-642-17187-1_5

Multi-viewpoint based similarity measure and optimality criteria for document clustering

Abstract

Author supplied keywords

Cite

Register to see more suggestions