Hierarchical Multimodal Transformer with Localness and Speaker Aware Attention for Emotion Recognition in Conversations

4Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Emotion Recognition in Conversations (ERC) aims to predict the emotion of each utterance in a given conversation. Existing approaches for the ERC task mainly suffer from two drawbacks: (1) failing to pay enough attention to the emotional impact of the local context; (2) ignoring the effect of the emotional inertia of speakers. To tackle these limitations, we first propose a Hierarchical Multimodal Transformer as our base model, followed by carefully designing a localness-aware attention mechanism and a speaker-aware attention mechanism to respectively capture the impact of the local context and the emotional inertia. Extensive evaluations on a benchmark dataset demonstrate the superiority of our proposed model over existing multimodal methods for ERC.

Cite

CITATION STYLE

APA

Jin, X., Yu, J., Ding, Z., Xia, R., Zhou, X., & Tu, Y. (2020). Hierarchical Multimodal Transformer with Localness and Speaker Aware Attention for Emotion Recognition in Conversations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12431 LNAI, pp. 41–53). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60457-8_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free