End-to-End Multi-dialect Malayalam Speech Recognition Using Deep-CNN, LSTM-RNN, and Machine Learning Approaches

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Research in Malayalam speech recognition is constrained by the scarcity of speech data. Accent variation poses the greatest challenge for automatic speech recognition (ASR) for any language. Malayalam, spoken by the people in the southernmost state of India, has a wide range of accents that reflect regional, cultural, and religious differences. Malayalam is a low-resource language; there are not many works proposed in the ASR of the language which makes this work more significant and challenging at the same time. The majority of the experiments done in the ASR for Malayalam use the traditional HMM methods. No benchmark dataset for accented data is available for doing research. The authors have constructed accent-based data for doing this experiment. The proposed methodology comprises three distinct stages: dataset preparation, feature engineering, and classification using machine learning and deep learning approaches. A hybrid approach is adopted for the feature engineering process. Different feature extraction techniques are considered for extracting features from the inputted accent-based speech signals for the best representation of the data. Mel frequency cepstral coefficient (MFCC), short-term Fourier transformation (STFT), and mel spectrogram techniques are adopted for the feature engineering process. The features are then used to build machine learning models using multi-layer perceptron, decision tree, support vector machine, random forest, k-nearest neighbor, and the stochastic gradient descent classifiers. In the deep learning approach, the feature set is first fed to LSTM-RNN architecture to construct the accented ASR system. The next approach is to plot the spectrograms of the speech signals and hence represent the speech data as images. The features are then extracted from these spectrograms and fed into deep convolutional network architecture to build a deep learning model. Finally, a hybrid ASR system has been constructed from all the independent models. The result of each experiment is compared against each other to find the better approach for modeling the end-to-end accented ASR (AASR).

Cite

CITATION STYLE

APA

Thandil, R. K., Mohamed Basheer, K. P., & Muneer, V. K. (2023). End-to-End Multi-dialect Malayalam Speech Recognition Using Deep-CNN, LSTM-RNN, and Machine Learning Approaches. In Lecture Notes on Data Engineering and Communications Technologies (Vol. 163, pp. 37–49). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-0609-3_3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free