A comparative study of statistical feature reduction methods for arabic text categorization

Fouzi Harrag; Eyas El-Qawasmeh; Abdul Malik S. Al-Salman

Conference Proceedings

A comparative study of statistical feature reduction methods for arabic text categorization

Communications in Computer and Information Science (2010) 88 CCIS(PART 2) 676-682

DOI: 10.1007/978-3-642-14306-9_67

4Citations

10Readers

Get full text

Abstract

Feature reduction methods have been successfully applied to text categorization. In this paper, we perform a comparative study on three feature reduction methods for text categorization, including Document Frequency (DF), Term Frequency Inverse Document Frequency (TFIDF) and Latent Semantic Analyses (LSA). Our feature set is relatively large (since there are thousands of different terms in different texts files). We propose the use of the previous feature reduction methods as a preprocessor of Back-Propagation Neural Network (BPNN) to reduce the input data on training process. The experimental results on an Arabic data set demonstrate that among the three dimensionality reduction techniques proposed, TFIDF was found to be the most effective in reducing the dimensionality of the feature space. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Harrag, F., El-Qawasmeh, E., & Al-Salman, A. M. S. (2010). A comparative study of statistical feature reduction methods for arabic text categorization. In Communications in Computer and Information Science (Vol. 88 CCIS, pp. 676–682). https://doi.org/10.1007/978-3-642-14306-9_67

A comparative study of statistical feature reduction methods for arabic text categorization

Abstract

Author supplied keywords

Cite

Register to see more suggestions