Multi-modal sentiment and emotion analysis have been an emerging and prominent field nowadays at the intersection of natural language processing, deep learning, machine learning, computer vision, and speech processing. Sentiment and emotion prediction model finds the attitude of a speaker or writer towards any discussion, debate, event, document or topic. It can be expressed in different ways like the words spoken, energy and tone while delivering words, accompanying facial expressions, gestures, etc. Moreover related and similar tasks generally depend on each other and are predicted better if solved through a joint framework. In this paper, we present a multi-task gated contextual cross-modal attention framework which considers all the three modalities (viz. text, acoustic and visual) and multiple utterances for sentiment and emotion prediction together. We evaluate our proposed approach on CMU-MOSEI dataset for sentiment and emotion prediction. Evaluation results depict that our proposed approach extracts co-relation among the three modalities and attains an improvement over the previous state-of-the-art models.
CITATION STYLE
Sangwan, S., Chauhan, D. S., Akhtar, M. S., Ekbal, A., & Bhattacharyya, P. (2019). Multi-task gated contextual cross-modal attention framework for sentiment and emotion analysis. In Communications in Computer and Information Science (Vol. 1142 CCIS, pp. 662–669). Springer. https://doi.org/10.1007/978-3-030-36808-1_72
Mendeley helps you to discover research relevant for your work.