Learning an Unsupervised and Interpretable Representation of Emotion from Speech

0Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

One of the severe obstacles to naturalistic human affective computing is that emotions are complex constructs with fuzzy boundaries and substantial individual variations. Thus, an important issue to be considered in emotion analysis is generating a person-specific representation of emotion in an unsupervised manner. This paper presents a fully unsupervised method combining autoencoder with Principle Component Analysis to build an emotion representation from speech signals. As each person has a different way of expressing emotions, this method is applied to the subject level. We also investigate the relevancy of such a representation. Experiments on Emo-DB, IEMOCAP, and SEMAINE database show that the proposed representation of emotion is invariant among subjects and similar to the representation built by psychologists, especially on the arousal dimension.

Cite

CITATION STYLE

APA

Wang, S., Soladié, C., & Séguier, R. (2020). Learning an Unsupervised and Interpretable Representation of Emotion from Speech. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12335 LNAI, pp. 636–645). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60276-5_61

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free