Feature selection (FS) is a widely used method for removing redundant or irrelevant features to improve classification accuracy and decrease the model’s computational cost. In this paper, we present an improved method (referred to hereafter as RARF) for Arabic text classification (ATC) that employs the term frequency-inverse document frequency (TF-IDF) and Word2Vec embedding technique to identify words that have a particular semantic relationship. In addition, we have compared our method with four benchmark FS methods namely principal component analysis (PCA), linear discriminant analysis (LDA), chi-square, and mutual information (MI). Support vector machine (SVM), k-nearest neighbors (K-NN), and naive Bayes (NB) are three machine learning based algorithms used in this work. Two different Arabic datasets are utilized to perform a comparative analysis of these algorithms. This paper also evaluates the efficiency of our method for ATC on the basis of performance metrics viz accuracy, precision, recall, and F-measure. Results revealed that the highest accuracy achieved for the SVM classifier applied to the Khaleej-2004 Arabic dataset with 94.75%, while the same classifier recorded an accuracy of 94.01% for the Watan-2004 Arabic dataset.
CITATION STYLE
Sabri, T., Bahassine, S., El Beggar, O., & Kissi, M. (2024). An improved Arabic text classification method using word embedding. International Journal of Electrical and Computer Engineering, 14(1), 721–731. https://doi.org/10.11591/ijece.v14i1.pp721-731
Mendeley helps you to discover research relevant for your work.