Feature reduction methods have been successfully applied to text categorization. In this paper, we perform a comparative study on three feature reduction methods for text categorization, including Document Frequency (DF), Term Frequency Inverse Document Frequency (TFIDF) and Latent Semantic Analyses (LSA). Our feature set is relatively large (since there are thousands of different terms in different texts files). We propose the use of the previous feature reduction methods as a preprocessor of Back-Propagation Neural Network (BPNN) to reduce the input data on training process. The experimental results on an Arabic data set demonstrate that among the three dimensionality reduction techniques proposed, TFIDF was found to be the most effective in reducing the dimensionality of the feature space. © 2010 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Harrag, F., El-Qawasmeh, E., & Al-Salman, A. M. S. (2010). A comparative study of statistical feature reduction methods for arabic text categorization. In Communications in Computer and Information Science (Vol. 88 CCIS, pp. 676–682). https://doi.org/10.1007/978-3-642-14306-9_67
Mendeley helps you to discover research relevant for your work.