Three new feature weighting methods for text categorization

Wei Xue; Xinshun Xu

Conference Proceedings

Three new feature weighting methods for text categorization

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6318 LNCS(M4D) 352-359

DOI: 10.1007/978-3-642-16515-3_44

2Citations

5Readers

Get full text

Abstract

Feature weighting is an important phase of text categorization, which computes the feature weight for each feature of documents. This paper proposes three new feature weighting methods for text categorization. In the first and second proposed methods, traditional feature weighting method tf×idf is combined with "one-side" feature selection metrics (i.e. odds ratio, correlation coefficient) in a moderate manner, and positive and negative features are weighted separately. tf×idf+CC and tf×idf+OR are used to calculate the feature weights. In the third method, tf is combined with feature entropy, which is effective and concise. The feature entropy measures the diversity of feature's document frequency in different categories. The experimental results on Reuters-21578 corpus show that the proposed methods outperform several state-of-the-art feature weighting methods, such as tf×idf, tf×CHI, andtf×OR. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Xue, W., & Xu, X. (2010). Three new feature weighting methods for text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6318 LNCS, pp. 352–359). https://doi.org/10.1007/978-3-642-16515-3_44

Three new feature weighting methods for text categorization

Abstract

Author supplied keywords

Cite

Register to see more suggestions