Emotion Speech Recognition using Deep Learning

Othman O. Khalifa; M. I. Alhamada; Aisha H. Abdalla

Journal ArticleOPEN ACCESS

Emotion Speech Recognition using Deep Learning

Majlesi Journal of Electrical Engineering (2020) 14(4) 45-54

DOI: 10.29252/mjee.14.4.39

1Citations

19Readers

Abstract

Emotion Speech Recognition (ESR) is recognizing the formation and change of speaker’s emotional state from his/her speech signal. The main purpose of this field is to produce a convenient system that is able to effortlessly communicate and interact with humans. The reliability of the current speech emotion recognition systems is far from being achieved. However, this is a challenging task due to the gap between acoustic features and human emotions, which relies strongly on the discriminative acoustic features extracted for a given recognition task. Deep learning techniques have been recently proposed as an alternative to traditional techniques in ESR. In this paper, an overview of Deep Learning techniques that could be used in Emotional Speech recognition is presented. Different extracted features like MFCC as well as feature classifications methods including HMM, GMM, LTSTM and ANN have been discussed. In addition, the review covers databases used, emotions extracted, and contributions made toward ESR.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Khalifa, O. O., Alhamada, M. I., & Abdalla, A. H. (2020). Emotion Speech Recognition using Deep Learning. Majlesi Journal of Electrical Engineering, 14(4), 45–54. https://doi.org/10.29252/mjee.14.4.39

Readers' Seniority

Lecturer / Post doc 2

33%

PhD / Post grad / Masters / Doc 2

33%

Professor / Associate Prof. 1

17%

Researcher 1

17%

Readers' Discipline

Computer Science 6

100%

Emotion Speech Recognition using Deep Learning

Abstract

Author supplied keywords

References Powered by Scopus

A fast learning algorithm for deep belief nets

The calculation of posterior distributions by data augmentation

Speech emotion recognition using deep 1D & 2D CNN LSTM networks

Cited by Powered by Scopus

EEG-dependent automatic speech recognition using deep residual encoder based VGG net CNN

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline