Mispronunciation Detection for Spoken Isolated Words using Segmentation and Classification under Low Resource Conditions for Kannada Language

  • et al.
N/ACitations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Relocation makes it inevitable for a person to learn the local pronunciations correctly. With the advent of mobile phones, language learning can be made easy and flexible. Our research involves Kannada Kali, a mobile and cloud based application that is being developed to assist people in learning the correct pronunciations of Kannada (a language spoken in India). Automatic Speech Recognition systems which are used to aid pronunciation training require to be trained on sufficient amount of spoken target language data. Since collecting such data in not easy, the objective of our research is to detect mispronounced segments of words with minimal data. When there is scarcity of data, a comparative approach where spoken word segments are compared with the canonical pronunciations is more effective for detecting anomaly in pronunciation. Since syllables are basic independent units of pronunciation, the spoken words are segmented into syllables for effective comparison and feedback. We propose an unsupervised segmentation method called Spectrogram Formant Contour Analysis that detects syllable boundaries by analysing the change in contours of the formants in the spoken word spectrograms. The task of mispronunciation detection is more effective when the application can identify the actual syllable pronounced and communicate the correct pronunciation to the user. For the purpose of syllable classification, our method employs a novel approach where a model is trained on phonemes and given syllables as input for identification. Our study includes comparing the performance of three machine learning algorithms, namely, Convolution Neural Network, Support Vector Machines and K-Nearest Neighbours on the task of identifying phonemes when they are trained on minimal data. The accuracy of KNN on phoneme classification was the best with 80% for clean and 60% for noisy data. In case of our initial results on syllable classification for Kannada Kali, Support Vector Machines gave the highest accuracy of almost 30%.

Cite

CITATION STYLE

APA

Murthy*, S., Suresh, P., … Sitaram, D. (2019). Mispronunciation Detection for Spoken Isolated Words using Segmentation and Classification under Low Resource Conditions for Kannada Language. International Journal of Recent Technology and Engineering (IJRTE), 8(4), 11874–11882. https://doi.org/10.35940/ijrte.d9589.118419

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free