Multimodal emotion recognition based on the decoupling of emotion and speaker information

Rok Gajšek; Vitomir Štruc; France Mihelič

Conference Proceedings

Multimodal emotion recognition based on the decoupling of emotion and speaker information

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2010) 6231 LNAI 275-282

DOI: 10.1007/978-3-642-15760-8_35

2Citations

5Readers

Get full text

Abstract

The standard features used in emotion recognition carry, besides the emotion related information, also cues about the speaker. This is expected, since the nature of emotionally colored speech is similar to the variations in the speech signal, caused by different speakers. Therefore, we present a gradient descent derived transformation for the decoupling of emotion and speaker information contained in the acoustic features. The Interspeech '09 Emotion Challenge feature set is used as the baseline for the audio part. A similar procedure is employed on the video signal, where the nuisance attribute projection (NAP) is used to derive the transformation matrix, which contains information about the emotional state of the speaker. Ultimately, different NAP transformation matrices are compared using canonical correlations. The audio and video sub-systems are combined at the matching score level using different fusion techniques. The presented system is assessed on the publicly available eNTERFACE '05 database where significant improvements in the recognition performance are observed when compared to the stat-of-the-art baseline. © 2010 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Gajšek, R., Štruc, V., & Mihelič, F. (2010). Multimodal emotion recognition based on the decoupling of emotion and speaker information. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6231 LNAI, pp. 275–282). https://doi.org/10.1007/978-3-642-15760-8_35

Multimodal emotion recognition based on the decoupling of emotion and speaker information

Abstract

Author supplied keywords

Cite

Register to see more suggestions