Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

The rapid momentum of deep neural networks (DNNs) in recent years has yielded state-of-the-art performance in various machine-learning tasks using speaker identification systems. Speaker identification is based on the speech signals and the features that can be extracted from them. In this article, we proposed a speaker identification system using the developed DNNs models. The system is based on the acoustic and prosodic features of the speech signal, such as pitch frequency (vocal cords vibration rate), energy (loudness of speech), their derivations, and any additional acoustic and prosodic features. Additionally, the article investigates the existing recurrent neural networks (RNNs) models and adapts them to design a speaker identification system using the public YOHO LDC dataset. The average accuracy of the system was 91.93% in the best experiment for speaker identification. Furthermore, this paper helps uncover reasons for analyzing speakers and tokens yielding major errors to increase the system’s robustness regarding feature selection and system tune-up.

Cite

CITATION STYLE

APA

Almarshady, N. M., Alashban, A. A., & Alotaibi, Y. A. (2023). Analysis and Investigation of Speaker Identification Problems Using Deep Learning Networks and the YOHO English Speech Dataset. Applied Sciences (Switzerland), 13(17). https://doi.org/10.3390/app13179567

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free