Speech emotion recognition integrating paralinguistic features and auto-encoders in a deep learning model

Rubén D. Fonnegra; Gloria M. Díaz

Conference ProceedingsOPEN ACCESS

Speech emotion recognition integrating paralinguistic features and auto-encoders in a deep learning model

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 10901 LNCS 385-396

DOI: 10.1007/978-3-319-91238-7_31

8Citations

14Readers

Abstract

Emotions play an extremely important role in human decisions and interactions with both other humans and machines. This fact had promoted development of methods that aim to recognize emotions from different physiological signals. Particularly, emotion recognition from speech signals is still a research challenge due to the large voice variability between subjects. In this work, paralinguistic features and deep learning models are used to perform speech emotion classification. A set of 1582 INTERSPEECH 2010 features is initially extracted from the speech signals, which are then used to feed a deep convolutional stack auto-encoder network that transform those features in a higher level representation. Then, a multilayer perceptron is trained to classify the utterances in one of six emotions: anger, fear, disgust, happiness, surprise and sadness. The size of the auto-encoders was evaluated for 4 different architectures, in terms of performance, computational cost and execution time for obtaining the most suitable configuration model. Thus, the proposed approach was twofold evaluated. First, a 5-fold cross-validation strategy was performed using 70% of the samples. Then, the best network architecture was used to evaluate the classification in a validation set, composed of the remaining 30% of samples. Results report an overall accuracy of 91.4 in the 5-fold testing stage and 61, 1 in the validation set.

Author supplied keywords

Cite

CITATION STYLE

APA

Fonnegra, R. D., & Díaz, G. M. (2018). Speech emotion recognition integrating paralinguistic features and auto-encoders in a deep learning model. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10901 LNCS, pp. 385–396). Springer Verlag. https://doi.org/10.1007/978-3-319-91238-7_31

Speech emotion recognition integrating paralinguistic features and auto-encoders in a deep learning model

Abstract

Author supplied keywords

Cite

Register to see more suggestions