Hierarchical Multimodal Transformer with Localness and Speaker Aware Attention for Emotion Recognition in Conversations

Xiao Jin; Jianfei Yu; Zixiang Ding; Rui Xia; Xiangsheng Zhou; Yaofeng Tu

Conference Proceedings

Hierarchical Multimodal Transformer with Localness and Speaker Aware Attention for Emotion Recognition in Conversations

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2020) 12431 LNAI 41-53

DOI: 10.1007/978-3-030-60457-8_4

4Citations

11Readers

Get full text

Abstract

Emotion Recognition in Conversations (ERC) aims to predict the emotion of each utterance in a given conversation. Existing approaches for the ERC task mainly suffer from two drawbacks: (1) failing to pay enough attention to the emotional impact of the local context; (2) ignoring the effect of the emotional inertia of speakers. To tackle these limitations, we first propose a Hierarchical Multimodal Transformer as our base model, followed by carefully designing a localness-aware attention mechanism and a speaker-aware attention mechanism to respectively capture the impact of the local context and the emotional inertia. Extensive evaluations on a benchmark dataset demonstrate the superiority of our proposed model over existing multimodal methods for ERC.

Author supplied keywords

Cite

CITATION STYLE

APA

Jin, X., Yu, J., Ding, Z., Xia, R., Zhou, X., & Tu, Y. (2020). Hierarchical Multimodal Transformer with Localness and Speaker Aware Attention for Emotion Recognition in Conversations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12431 LNAI, pp. 41–53). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60457-8_4

Hierarchical Multimodal Transformer with Localness and Speaker Aware Attention for Emotion Recognition in Conversations

Abstract

Author supplied keywords

Cite

Register to see more suggestions