Comparative Study of CNN Structures for Arabic Speech Recognition

Zoubir Talai; Nada Kherici; Halima Bahi

Journal ArticleOPEN ACCESS

Comparative Study of CNN Structures for Arabic Speech Recognition

Ingenierie des Systemes d'Information (2023) 28(2) 327-333

DOI: 10.18280/isi.280208

3Citations

11Readers

Abstract

Speech recognition is an essential ability of human beings and is crucial for communication. Consequently, automatic speech recognition (ASR) is a major area of research that is increasingly using artificial intelligence techniques to replicate this human ability. Among these techniques, deep learning (DL) models attract much attention, in particular, convolutional neural networks (CNN) which are known due to their power to model spatial relationships. In this article, three CNN architectures that performed well in recognized competitions were implemented to compare their performance in Arabic speech recognition; these are the well-known models AlexNet, ResNet, and GoogLeNet. These models were compared based on a corpus composed of Arabic spoken digits collected from various sources, including messaging and social media applications, in addition to an online corpus. The architectures of AlexNet, ResNet, and GoogLeNet achieved respectively an accuracy of 86.19%, 83.46%, and 89.61%. The results show the superiority of GoogLeNet, and underline the potential of CNN architectures to model acoustic features of low-resource languages such as Arabic.

Author supplied keywords

Cite

CITATION STYLE

APA

Talai, Z., Kherici, N., & Bahi, H. (2023). Comparative Study of CNN Structures for Arabic Speech Recognition. Ingenierie Des Systemes d’Information, 28(2), 327–333. https://doi.org/10.18280/isi.280208

Comparative Study of CNN Structures for Arabic Speech Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions