Abstract
The paper introduces a cross-lingual speaker identification system for Indian languages, utilising a Long Short-Term Memory dense neural network (LSTM-DNN). The system was trained on audio recordings in English and evaluated on data from Hindi, Kannada, Malayalam, Tamil, and Telugu, with a view to how factors such as phonetic similarity and native accent affect performance. The model was fed with MFCC (mel-frequency cepstral coefficient) features extracted from the audio file. For comparison, the corresponding melspectrogram images were also used as input to a ResNet-50 model, while the raw audio was used to train a Siamese network. The LSTM-DNN model outperformed the other two models as well as two more traditional baseline speaker identification models, showing that deep learning models are superior to probabilistic models for capturing low-level speech features and learning speaker characteristics.
Cite
CITATION STYLE
Rizvi, A., Jamatia, A., Rudrapal, D., Chakma, K., & Gambäck, B. (2023). Cross-Lingual Speaker Identification for Indian Languages. In International Conference Recent Advances in Natural Language Processing, RANLP (pp. 979–987). Incoma Ltd. https://doi.org/10.26615/978-954-452-092-2_105
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.