Multimodal language analysis with recurrent multistage fusion

138Citations
Citations of this article
198Readers
Mendeley users who have this article in their library.

Abstract

Computational modeling of human multimodal language is an emerging research area in natural language processing spanning the language, visual and acoustic modalities. Comprehending multimodal language requires modeling not only the interactions within each modality (intra-modal interactions) but more importantly the interactions between modalities (cross-modal interactions). In this paper, we propose the Recurrent Multistage Fusion Network (RMFN) which decomposes the fusion problem into multiple stages, each of them focused on a subset of multimodal signals for specialized, effective fusion. Cross-modal interactions are modeled using this multistage fusion approach which builds upon intermediate representations of previous stages. Temporal and intra-modal interactions are modeled by integrating our proposed fusion approach with a system of recurrent neural networks. The RMFN displays state-of-the-art performance in modeling human multimodal language across three public datasets relating to multimodal sentiment analysis, emotion recognition, and speaker traits recognition. We provide visualizations to show that each stage of fusion focuses on a different subset of multimodal signals, learning increasingly discriminative multimodal representations.

References Powered by Scopus

Random forests

95789Citations
N/AReaders
Get full text

Long Short-Term Memory

77659Citations
N/AReaders
Get full text

Support-Vector Networks

46066Citations
N/AReaders
Get full text

Cited by Powered by Scopus

MISA: Modality-Invariant and -Specific Representations for Multimodal Sentiment Analysis

632Citations
N/AReaders
Get full text

Words can shift: Dynamically adjusting word representations using nonverbal behaviors

378Citations
N/AReaders
Get full text

Found in translation: Learning robust joint representations by cyclic translations between modalities

348Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Liang, P. P., Liu, Z., Zadeh, A., & Morency, L. P. (2018). Multimodal language analysis with recurrent multistage fusion. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, EMNLP 2018 (pp. 150–161). Association for Computational Linguistics. https://doi.org/10.18653/v1/d18-1014

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 62

70%

Researcher 16

18%

Lecturer / Post doc 6

7%

Professor / Associate Prof. 5

6%

Readers' Discipline

Tooltip

Computer Science 89

82%

Engineering 10

9%

Linguistics 6

6%

Business, Management and Accounting 3

3%

Save time finding and organizing research with Mendeley

Sign up for free