Filtering methods for feature selection in web-document clustering

0Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper presents the results of a comparative study of filtering methods for feature selection in web document clustering. First, we focused on feature selection methods based on Mutual Information (MI) and Information Gain (IG). With those features and feature values, and using MI and IG, we extracted from documents representative max-value features as well as a representative cluster for a feature and a representative cluster for a document. Second, we tested the Max Feature Selection Method (MFSM) with those representative features and clusters, and evaluated the web-document clustering performance. However, when document sets yield poor clustering results by term frequency, we cannot obtain good features using the MFSM with the MI and IG values. Therefore, we propose new filtering methods, Min Count of Representative Cluster for a Feature (MCRCF) and Min Count of Representative Cluster for a Document (MCRCD). In the experimental results, the MFSM showed better performance than was achieved using only term frequency, MI and IG. And when we applied the new filtering methods for feature selection (MCRCF, MCRCD), the clustering performance improved notably. Thus we can assert that those filtering methods are effective means of feature selection and offer good performance in web document clustering. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Park, H., & Kwon, H. C. (2007). Filtering methods for feature selection in web-document clustering. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4488 LNCS, pp. 1218–1221). Springer Verlag. https://doi.org/10.1007/978-3-540-72586-2_170

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free