End-to-End Multi-dialect Malayalam Speech Recognition Using Deep-CNN, LSTM-RNN, and Machine Learning Approaches

Rizwana Kallooravi Thandil; K. P. Mohamed Basheer; V. K. Muneer

Book Chapter

End-to-End Multi-dialect Malayalam Speech Recognition Using Deep-CNN, LSTM-RNN, and Machine Learning Approaches

Springer Science and Business Media Deutschland GmbH, (2023), 37-49

DOI: 10.1007/978-981-99-0609-3_3

0Citations

2Readers

Get full text

Abstract

Research in Malayalam speech recognition is constrained by the scarcity of speech data. Accent variation poses the greatest challenge for automatic speech recognition (ASR) for any language. Malayalam, spoken by the people in the southernmost state of India, has a wide range of accents that reflect regional, cultural, and religious differences. Malayalam is a low-resource language; there are not many works proposed in the ASR of the language which makes this work more significant and challenging at the same time. The majority of the experiments done in the ASR for Malayalam use the traditional HMM methods. No benchmark dataset for accented data is available for doing research. The authors have constructed accent-based data for doing this experiment. The proposed methodology comprises three distinct stages: dataset preparation, feature engineering, and classification using machine learning and deep learning approaches. A hybrid approach is adopted for the feature engineering process. Different feature extraction techniques are considered for extracting features from the inputted accent-based speech signals for the best representation of the data. Mel frequency cepstral coefficient (MFCC), short-term Fourier transformation (STFT), and mel spectrogram techniques are adopted for the feature engineering process. The features are then used to build machine learning models using multi-layer perceptron, decision tree, support vector machine, random forest, k-nearest neighbor, and the stochastic gradient descent classifiers. In the deep learning approach, the feature set is first fed to LSTM-RNN architecture to construct the accented ASR system. The next approach is to plot the spectrograms of the speech signals and hence represent the speech data as images. The features are then extracted from these spectrograms and fed into deep convolutional network architecture to build a deep learning model. Finally, a hybrid ASR system has been constructed from all the independent models. The result of each experiment is compared against each other to find the better approach for modeling the end-to-end accented ASR (AASR).

Author supplied keywords

Cite

CITATION STYLE

APA

Thandil, R. K., Mohamed Basheer, K. P., & Muneer, V. K. (2023). End-to-End Multi-dialect Malayalam Speech Recognition Using Deep-CNN, LSTM-RNN, and Machine Learning Approaches. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 163, pp. 37–49). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-0609-3_3

End-to-End Multi-dialect Malayalam Speech Recognition Using Deep-CNN, LSTM-RNN, and Machine Learning Approaches

Abstract

Author supplied keywords

Cite

Register to see more suggestions