Study for Automatic Classification of Arabic Spoken Documents

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the important tasks in natural language processing is speech classification by domain. As shown in the literature, no prior studies have addressed this problem, specially the effect of using root N-grams and stem N-grams on Arabic speech classification performance. In this paper we describe a study for Arabic spoken documents classification, using the K-Nearest Neighbor, the Naive Bayes and the Support Vector Machine. We create a speech recognition system for the transcription of Arabic audio files. Then, we use four types of features: 1-gram, 2-gram and 3-gram word roots or stems as well as full words. The obtained results show that, compared to stem or word N-grams, the use of a 1-gram root as a feature provides greater classification performance for Arabic speech classification. It is that classification performance decreases whenever the number of N-grams increases. The data also exhibit that the support vector machine outperforms the Naïve Bayes and the k-nearest neighbor with 1 gram. Whenever the k-nearest neighbor is used, the 2-gram root achieves the best performance. The 3-gram root, on the other hand, achieves the best performance whenever the support vector machine was used.

Cite

CITATION STYLE

APA

Labidi, M., Maraoui, M., & Zrigui, M. (2017). Study for Automatic Classification of Arabic Spoken Documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10449 LNAI, pp. 459–468). Springer Verlag. https://doi.org/10.1007/978-3-319-67077-5_44

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free