A study on feature weighting in Chinese text categorization

N/ACitations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In Text Categorization (TC) based on Vector Space Model, feature weighting and feature selection are major problems and difficulties. This paper proposes two methods of weighting features by combining the relevant influential factors together. A TC system for Chinese texts is designed in terms of character bigrams as features. Experiments on a document collection of 71,674 texts show that the F1 metric of categorization performance of the system is 85.9%, which is about 5% higher than that of the well-known TF*IDF weighting scheme. Moreover, a multi-step feature selection process is exploited to reduce the dimension of the feature space effectively in the system.

Cite

CITATION STYLE

APA

Dejun, X., & Maosong, S. (2003). A study on feature weighting in Chinese text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2588, pp. 592–601). Springer Verlag. https://doi.org/10.1007/3-540-36456-0_66

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free