A comparative study of statistical feature reduction methods for arabic text categorization

4Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Feature reduction methods have been successfully applied to text categorization. In this paper, we perform a comparative study on three feature reduction methods for text categorization, including Document Frequency (DF), Term Frequency Inverse Document Frequency (TFIDF) and Latent Semantic Analyses (LSA). Our feature set is relatively large (since there are thousands of different terms in different texts files). We propose the use of the previous feature reduction methods as a preprocessor of Back-Propagation Neural Network (BPNN) to reduce the input data on training process. The experimental results on an Arabic data set demonstrate that among the three dimensionality reduction techniques proposed, TFIDF was found to be the most effective in reducing the dimensionality of the feature space. © 2010 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Harrag, F., El-Qawasmeh, E., & Al-Salman, A. M. S. (2010). A comparative study of statistical feature reduction methods for arabic text categorization. In Communications in Computer and Information Science (Vol. 88 CCIS, pp. 676–682). https://doi.org/10.1007/978-3-642-14306-9_67

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free