Audio-Visual Shared Emotion Representation for Robust Emotion Recognition on Modality Missing Using Hemi-hyperspherical Embedding and Latent Space Unification

Seiichi Harata; Takuto Sakuma; Shohei Kato

Conference Proceedings

Audio-Visual Shared Emotion Representation for Robust Emotion Recognition on Modality Missing Using Hemi-hyperspherical Embedding and Latent Space Unification

Communications in Computer and Information Science (2022) 1581 CCIS 137-143

DOI: 10.1007/978-3-031-06388-6_18

1Citations

2Readers

Get full text

Abstract

In Affective Computing, a mathematical representation of emotions in the computer is desirable for emotionally interactive agents. This study aims to obtain a latent representation of emotions (an emotional space) common to the modalities, focusing on that humans can recognize emotions from multiple modalities. We define the emotional space as the latent space of the multimodal DNN model and propose embedding emotional information into a Hemi-hyperspherical space. Our proposed model fuses the emotional spaces of each modality with an element-wise weighted average fashion. We train the model by combining emotion recognition and latent space unification tasks. The unification task is the loss of distance between emotional spaces from different modalities expressed simultaneously, which leads to acquiring a similar space from different modalities. Experiments using audio-visual data evaluate the robustness of emotion recognition against modalities missing. The results confirmed that the proposed method, especially in the low-dimensional Hemi-hyperspherical representations, could acquire a shared representation of emotion across modalities.

Author supplied keywords

Cite

CITATION STYLE

APA

Harata, S., Sakuma, T., & Kato, S. (2022). Audio-Visual Shared Emotion Representation for Robust Emotion Recognition on Modality Missing Using Hemi-hyperspherical Embedding and Latent Space Unification. In Communications in Computer and Information Science (Vol. 1581 CCIS, pp. 137–143). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-06388-6_18

Audio-Visual Shared Emotion Representation for Robust Emotion Recognition on Modality Missing Using Hemi-hyperspherical Embedding and Latent Space Unification

Abstract

Author supplied keywords

Cite

Register to see more suggestions