Arabic texts categorization: Features selection based on the extraction of words’ roots

1Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of methods used to reduce the size of terms vocabulary in Arabic text categorization is to replace the different variants (forms) of words by their common root. The search of root in Arabic or Arabic word root extraction is more difficult than other languages since Arabic language has a very different and difficult structure, that is because it is a very rich language with complex morphology. Many algorithms are proposed in this field. Some of them are based on morphological rules and grammatical patterns, thus they are quite difficult and require deep linguistic knowledge. Others are statistical, so they are less difficult and based only on some calculations. In this paper we propose a new statistical algorithm which permits to extract roots of Arabic words using the technique of n-grams of characters without using any morphological rule or grammatical patterns.

Cite

CITATION STYLE

APA

Gadri, S., & Moussaoui, A. (2015). Arabic texts categorization: Features selection based on the extraction of words’ roots. In IFIP Advances in Information and Communication Technology (Vol. 456, pp. 167–180). Springer New York LLC. https://doi.org/10.1007/978-3-319-19578-0_14

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free