One of the severe obstacles to naturalistic human affective computing is that emotions are complex constructs with fuzzy boundaries and substantial individual variations. Thus, an important issue to be considered in emotion analysis is generating a person-specific representation of emotion in an unsupervised manner. This paper presents a fully unsupervised method combining autoencoder with Principle Component Analysis to build an emotion representation from speech signals. As each person has a different way of expressing emotions, this method is applied to the subject level. We also investigate the relevancy of such a representation. Experiments on Emo-DB, IEMOCAP, and SEMAINE database show that the proposed representation of emotion is invariant among subjects and similar to the representation built by psychologists, especially on the arousal dimension.
CITATION STYLE
Wang, S., Soladié, C., & Séguier, R. (2020). Learning an Unsupervised and Interpretable Representation of Emotion from Speech. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12335 LNAI, pp. 636–645). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60276-5_61
Mendeley helps you to discover research relevant for your work.