In this paper, we propose a probabilistic approach to feature selection for multi-class text categorization. Specifically, we regard document class and occurrence of each feature as events, calculate the probability of occurrence of each feature by the theorem on the total probability and utilize the values as a ranking criterion. Experiments on Reuters-2000 collection show that the proposed method can yield better performance than information gain and χ-square, which are two wellknown feature selection methods. © Springer-Verlag Berlin Heidelberg 2007.
CITATION STYLE
Wu, K., Lu, B. L., Uchiyama, M., & Isahara, H. (2007). A probabilistic approach to feature selection for multi-class text categorization. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4491 LNCS, pp. 1310–1317). Springer Verlag. https://doi.org/10.1007/978-3-540-72383-7_153
Mendeley helps you to discover research relevant for your work.