A simple probability based term weighting scheme for automated text classification

1Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the automated text classification, tfidf is often considered as the default term weighting scheme and has been widely reported in literature. However, tfidf does not directly reflect terms' category membership. Inspired by the analysis of various feature selection methods, we propose a simple probability based term weighting scheme which directly utilizes two critical information ratios, i.e. relevance indicators. These relevance indicators are nicely supported by probability estimates which embody the category membership. Our experimental study based on two data sets, including Reuters-21578, demonstrates that the proposed probability based term weighting scheme outperforms tfidf significantly using Bayesian classifier and Support Vector Machines (SVM). © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Liu, Y., & Loh, H. T. (2007). A simple probability based term weighting scheme for automated text classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4570 LNAI, pp. 33–43). Springer Verlag. https://doi.org/10.1007/978-3-540-73325-6_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free