Multimodal End-to-End Sparse Model for Emotion Recognition

Wenliang Dai; Samuel Cahyawijaya; Zihan Liu; Pascale Fung

Conference Proceedings

Multimodal End-to-End Sparse Model for Emotion Recognition

NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (2021) 5305-5316

DOI: 10.18653/v1/2021.naacl-main.417

69Citations

106Readers

Get full text

Abstract

Existing works on multimodal affective computing tasks, such as emotion recognition, generally adopt a two-phase pipeline, first extracting feature representations for each single modality with hand-crafted algorithms and then performing end-to-end learning with the extracted features. However, the extracted features are fixed and cannot be further fine-tuned on different target tasks, and manually finding feature extraction algorithms does not generalize or scale well to different tasks, which can lead to sub-optimal performance. In this paper, we develop a fully end-to-end model that connects the two phases and optimizes them jointly. In addition, we restructure the current datasets to enable the fully end-to-end training. Furthermore, to reduce the computational overhead brought by the end-to-end model, we introduce a sparse cross-modal attention mechanism for the feature extraction. Experimental results show that our fully end-to-end model significantly surpasses the current state-of-the-art models based on the two-phase pipeline. Moreover, by adding the sparse cross-modal attention, our model can maintain performance with around half the computation in the feature extraction part.

Cite

CITATION STYLE

APA

Dai, W., Cahyawijaya, S., Liu, Z., & Fung, P. (2021). Multimodal End-to-End Sparse Model for Emotion Recognition. In NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Conference (pp. 5305–5316). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2021.naacl-main.417

Multimodal End-to-End Sparse Model for Emotion Recognition

Abstract

Cite

Register to see more suggestions