Cross-Lingual Speaker Identification for Indian Languages

3Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The paper introduces a cross-lingual speaker identification system for Indian languages, utilising a Long Short-Term Memory dense neural network (LSTM-DNN). The system was trained on audio recordings in English and evaluated on data from Hindi, Kannada, Malayalam, Tamil, and Telugu, with a view to how factors such as phonetic similarity and native accent affect performance. The model was fed with MFCC (mel-frequency cepstral coefficient) features extracted from the audio file. For comparison, the corresponding melspectrogram images were also used as input to a ResNet-50 model, while the raw audio was used to train a Siamese network. The LSTM-DNN model outperformed the other two models as well as two more traditional baseline speaker identification models, showing that deep learning models are superior to probabilistic models for capturing low-level speech features and learning speaker characteristics.

Cite

CITATION STYLE

APA

Rizvi, A., Jamatia, A., Rudrapal, D., Chakma, K., & Gambäck, B. (2023). Cross-Lingual Speaker Identification for Indian Languages. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 979–987). Incoma Ltd. https://doi.org/10.26615/978-954-452-092-2_105

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free