Multimodal Multitask Emotion Recognition using Images, Texts and Tags

14Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently, multimodal emotion recognition received an increasing interest due to its potential to improve performance by leveraging complementary sources of information. In this work, we explore the use of images, texts and tags for emotion recognition. However, using several modalities can also come with an additional challenge that is often ignored, namely the problem of "missing modality". Social media users do not always publish content containing an image, text and tags, and consequently one or two modalities are often missing at test time. Similarly, the labeled training data that contain all modalities can be limited. Taking this in consideration, we propose a multimodal model that leverages a multitask framework to enable the use of training data composed of an arbitrary number of modality, while it can also perform predictions with missing modalities. We show that our approach is robust to one or two missing modalities at test time. Also, with this framework it becomes easy to fine-tune some parts of our model with unimodal and bimodal training data, which can further improve overall performance. Finally, our experiments support that this multitask learning also acts as a regularization mechanism that improves generalization.

Cite

CITATION STYLE

APA

Fortin, M. P., & Chaib-Draa, B. (2019). Multimodal Multitask Emotion Recognition using Images, Texts and Tags. In WCRML 2019 - Proceedings of the ACM Workshop on Crossmodal Learning and Application (pp. 3–10). Association for Computing Machinery, Inc. https://doi.org/10.1145/3326459.3329165

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free