A Deep Diacritics-Based Recognition Model for Arabic Speech: Quranic Verses as Case Study

3Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Arabic is the language of more than 422 million of the world's population. Although classic Arabic is the Quran language that 1.9 billion Muslims are required to recite, limited Arabic speech recognition exists. In classic Arabic, diacritics affect the pronunciation of a word, a change in a diacritic can change the meaning of a word. However, most of the Arabic-based speech recognition models discarded the diacritics. This work aims to recognize the classic Arabic speech while considering diacritics by converting audio signals to diacritized text using Deep Neural Network (DNN)-based models. The DNN-based model recognizes speech using DNN which outperformed the traditional speech recognition systems' phonetics dependency. Three models were developed to recognize Arabic speech: (i) Time Delay Neural Network-Connectionist Temporal Classification (CTC), (ii) Recurrent Neural Network (RNN)-CTC, and (iii) transformer. A 100hours dataset of the Quran recordings has been used. Based on the results, the RNN-CTC model obtained state-of-the-art results with the lowest word error rate of 19.43% and a 3.51% character error rate. RNN-CTC model recognized character-by-character which is more reliable compared to transformers' whole-sentence recognition behaviour. The model performed well with clear unstressed recordings of short sentences. Moreover, the RNN-CTC model effectively recognized out-of-the-dataset sounds. The findings recommend continuing the efforts in enhancing the diacritics-based Arabic speech recognition models using clear and unstressed recordings to obtain better performance. Moreover, pretraining large speech models could obtain accurate recognition. The outcomes can be used to enhance the existing classic Arabic speech recognition solutions by supporting diacritics recognition.

Cite

CITATION STYLE

APA

Alrumiah, S. S., & Al-Shargabi, A. A. (2023). A Deep Diacritics-Based Recognition Model for Arabic Speech: Quranic Verses as Case Study. IEEE Access, 11, 81348–81360. https://doi.org/10.1109/ACCESS.2023.3300972

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free