Effective Arabic Stemmer Based Hybrid Approach for Arabic Text Categorization

  • Hadni M
  • Ouatik S
  • Lachkar A
N/ACitations
Citations of this article
25Readers
Mendeley users who have this article in their library.

Abstract

Text pre-processing of Arabic Language is a challenge and crucial stage in Text Categorization (TC) particularly and Text Mining (TM) generally. Stemming algorithms can be employed in Arabic text pre-processing to reduces words to their stems/or root. Arabic stemming algorithms can be ranked, according to three category, as root-based approach (ex. Khoja); stem-based approach (ex. Larkey); and statistical approach (ex. N-Garm). However, no stemming of this language is perfect: The existing stemmers have a small efficiency. In this paper, in order to improve the accuracy of stemming and therefore the accuracy of our proposed TC system, an efficient hybrid method is proposed for stemming Arabic text. The effectiveness of the aforementioned four methods was evaluated and compared in term of the F-measure of the Naïve Bayesian classifier and the Support Vector Machine classifier used in our TC system. The proposed stemming algorithm was found to supersede the other stemming ones: The obtained results illustrate that using the proposed stemmer enhances greatly the performance of Arabic Text Categorization.

Cite

CITATION STYLE

APA

Hadni, M., Ouatik, S. A., & Lachkar, A. (2013). Effective Arabic Stemmer Based Hybrid Approach for Arabic Text Categorization. International Journal of Data Mining & Knowledge Management Process, 3(4), 1–14. https://doi.org/10.5121/ijdkp.2013.3401

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free