In our research, we propose a new multimodal fusion architecture for the task of sentiment analysis. The 3 modalities used in this paper are text, audio and video. Most of the current methods deal with either a feature level or a decision level fusion. In contrast, we propose an attention-based deep neural network and a training approach to facilitate both feature and decision level fusion. Our network effectively leverages information across all three modalities using a 2 stage fusion process. We test our network on the individual utterance based contextual information extracted from the CMUMOSI Dataset. A comparison is drawn between the state-ofthe- A rt and our network.
CITATION STYLE
Harish, A. B., & Sadat, F. (2020). Trimodal Attention Module for Multimodal Sentiment Analysis. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 13803–13804). AAAI press.
Mendeley helps you to discover research relevant for your work.