An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems

2Citations
Citations of this article
73Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we introduce an enhancement for speech recognition systems using an unsupervised speaker clustering technique. The proposed technique is mainly based on I-vectors and Self-Organizing Map Neural Network (SOM). The input to the proposed algorithm is a set of speech utterances. For each utterance, we extract 100-dimensional I-vector and then SOM is used to group the utterances to different speakers. In our experiments, we compared our technique with Normalized Cross Likelihood ratio Clustering (NCLR). Results show that the proposed technique reduces the speaker error rate in comparison with NCLR. Finally, we have experimented the effect of speaker clustering on Speaker Adaptive Training (SAT) in a speech recognition system implemented to test the performance of the proposed technique. It was noted that the proposed technique reduced the WER over clustering speakers with NCLR.

Cite

CITATION STYLE

APA

Ahmed, H., Elaraby, M. S., Moussa, A. M., Abdallah, M., Abdou, S. M., & Rashwan, M. (2017). An Unsupervised Speaker Clustering Technique based on SOM and I-vectors for Speech Recognition Systems. In WANLP 2017, co-located with EACL 2017 - 3rd Arabic Natural Language Processing Workshop, Proceedings of the Workshop (pp. 79–83). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/W17-1310

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free