Emotion Recognition in Conversations (ERC) aims to predict the emotion of each utterance in a given conversation. Existing approaches for the ERC task mainly suffer from two drawbacks: (1) failing to pay enough attention to the emotional impact of the local context; (2) ignoring the effect of the emotional inertia of speakers. To tackle these limitations, we first propose a Hierarchical Multimodal Transformer as our base model, followed by carefully designing a localness-aware attention mechanism and a speaker-aware attention mechanism to respectively capture the impact of the local context and the emotional inertia. Extensive evaluations on a benchmark dataset demonstrate the superiority of our proposed model over existing multimodal methods for ERC.
CITATION STYLE
Jin, X., Yu, J., Ding, Z., Xia, R., Zhou, X., & Tu, Y. (2020). Hierarchical Multimodal Transformer with Localness and Speaker Aware Attention for Emotion Recognition in Conversations. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12431 LNAI, pp. 41–53). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-60457-8_4
Mendeley helps you to discover research relevant for your work.