Temporal Graph Convolutional Network for Multimodal Sentiment Analysis

7Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we propose a temporal graph convolutional network (TGCN) to recognize the sentiments from language (textual), acoustic, and visual (facial expressions) modalities. TGCN constructs a modality-specific graph whose nodes are the aligned segments in the multimodal utterances and edges are weighted according to the distances between their features, in order to learn node embeddings with sequential semantics underlying the utterances. In particular, we use positional encoding by interleaving sine and cosine embedding to encode the positions of the segments in the utterances into their features. Given the modality-specific embeddings of the segments in utterances, we create an attention mechanism corresponding to the segments to capture the sentiment-related ones and obtain the unified embeddings of utterances. Furthermore, we fuse the attended embeddings of the multimodel utterances and conduct the attention to capture their interaction. Finally, the fused embeddings together with their raw features are concatenated together for sentiment predictions. Extensive experiments on three publicly available datasets show that TGCN outperforms the state-of-the-art methods.

Cite

CITATION STYLE

APA

Huang, J., Lin, Z., Yang, Z., & Liu, W. (2021). Temporal Graph Convolutional Network for Multimodal Sentiment Analysis. In ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction (pp. 239–247). Association for Computing Machinery, Inc. https://doi.org/10.1145/3462244.3479939

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free