Study for Automatic Classification of Arabic Spoken Documents

Mohamed Labidi; Mohsen Maraoui; Mounir Zrigui

Conference Proceedings

Study for Automatic Classification of Arabic Spoken Documents

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10449 LNAI 459-468

DOI: 10.1007/978-3-319-67077-5_44

0Citations

6Readers

Get full text

Abstract

One of the important tasks in natural language processing is speech classification by domain. As shown in the literature, no prior studies have addressed this problem, specially the effect of using root N-grams and stem N-grams on Arabic speech classification performance. In this paper we describe a study for Arabic spoken documents classification, using the K-Nearest Neighbor, the Naive Bayes and the Support Vector Machine. We create a speech recognition system for the transcription of Arabic audio files. Then, we use four types of features: 1-gram, 2-gram and 3-gram word roots or stems as well as full words. The obtained results show that, compared to stem or word N-grams, the use of a 1-gram root as a feature provides greater classification performance for Arabic speech classification. It is that classification performance decreases whenever the number of N-grams increases. The data also exhibit that the support vector machine outperforms the Naïve Bayes and the k-nearest neighbor with 1 gram. Whenever the k-nearest neighbor is used, the 2-gram root achieves the best performance. The 3-gram root, on the other hand, achieves the best performance whenever the support vector machine was used.

Author supplied keywords

Cite

CITATION STYLE

APA

Labidi, M., Maraoui, M., & Zrigui, M. (2017). Study for Automatic Classification of Arabic Spoken Documents. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10449 LNAI, pp. 459–468). Springer Verlag. https://doi.org/10.1007/978-3-319-67077-5_44

Study for Automatic Classification of Arabic Spoken Documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions