Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction

13Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

Abstract

Emotion recognition is a crucial task for human conversation understanding. It becomes more challenging with the notion of multimodal data, e.g., language, voice, and facial expressions. As a typical solution, the global- and the local context information are exploited to predict the emotional label for every single sentence, i.e., utterance, in the dialogue. Specifically, the global representation could be captured via modeling of cross-modal interactions at the conversation level. The local one is often inferred using the temporal information of speakers or emotional shifts, which neglects vital factors at the utterance level. Additionally, most existing approaches take fused features of multiple modalities in an unified input without leveraging modality-specific representations. Motivating from these problems, we propose the Relational Temporal Graph Neural Network with Auxiliary Cross-Modality Interaction (CORECT), an novel neural network framework that effectively captures conversation-level cross-modality interactions and utterance-level temporal dependencies with the modality-specific manner for conversation understanding. Extensive experiments demonstrate the effectiveness of CORECT via its state-of-the-art results on the IEMOCAP and CMU-MOSEI datasets for the multimodal ERC task.

References Powered by Scopus

The graph neural network model

6439Citations
N/AReaders
Get full text

Modeling Relational Data with Graph Convolutional Networks

3676Citations
N/AReaders
Get full text

IEMOCAP: Interactive emotional dyadic motion capture database

2906Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Multi-corpus emotion recognition method based on cross-modal gated attention fusion

0Citations
N/AReaders
Get full text

Mi-CGA: Cross-modal Graph Attention Network for robust emotion recognition in the presence of incomplete modalities

0Citations
N/AReaders
Get full text

AC-EIC: addressee-centered emotion inference in conversations

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Nguyen, C. V. T., Mai, A. T., Le, T. S., Kieu, H. D., & Le, D. T. (2023). Conversation Understanding using Relational Temporal Graph Neural Networks with Auxiliary Cross-Modality Interaction. In EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings (pp. 15154–15167). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.emnlp-main.937

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 3

75%

Lecturer / Post doc 1

25%

Readers' Discipline

Tooltip

Computer Science 5

83%

Medicine and Dentistry 1

17%

Save time finding and organizing research with Mendeley

Sign up for free