A hybrid algorithm for web document clustering based on frequent term sets and k-means

Le Wang; Li Tian; Yan Jia; Weihong Han

Conference Proceedings

A hybrid algorithm for web document clustering based on frequent term sets and k-means

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4537 LNCS 198-203

DOI: 10.1007/978-3-540-72909-9_20

8Citations

7Readers

Get full text

Abstract

In order to conquer the major challenges of current web document clustering, i.e. huge volume of documents, high dimensional process and understandability of the cluster, we propose a simple hybrid algorithm (SHDC) based on top-k frequent term sets and k-means. Top-k frequent term sets are used to produce k initial means, which are regarded as initial clusters and further refined by k-means. The final optimal clustering is returned by k-means while the understandable description of clustering is provided by k frequent term sets. Experimental results on two public datasets indicate that SHDC outperforms other two representative clustering algorithms (the farthest first k-means and random initial k-means) both on efficiency and effectiveness. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Wang, L., Tian, L., Jia, Y., & Han, W. (2007). A hybrid algorithm for web document clustering based on frequent term sets and k-means. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4537 LNCS, pp. 198–203). Springer Verlag. https://doi.org/10.1007/978-3-540-72909-9_20

A hybrid algorithm for web document clustering based on frequent term sets and k-means

Abstract

Cite

Register to see more suggestions