Multimodal Sentiment Analysis (MSA) has made great progress that benefits from extraordinary fusion scheme. However, there is a lack of labeled data, resulting in severe overfitting and poor generalization for supervised models applied in this field. In this paper, we propose Sentiment Knowledge Enhanced Self-supervised Learning (SKESL) to capture common sentimental patterns in unlabeled videos, which facilitates further learning on limited labeled data. Specifically, with the help of sentiment knowledge and non-verbal behavior, SKESL conducts sentiment word masking and predicts fine-grained word sentiment intensity, so as to embed sentiment information at the word level into pre-trained multimodal representation. In addition, a non-verbal injection method is also proposed to integrate non-verbal information into the word semantics. Experiments on two standard benchmarks of MSA clearly show that SKESL significantly outperforms the baseline, and achieves new State-Of-The-Art (SOTA) results.
CITATION STYLE
Qian, F., Han, J., He, Y., Zheng, T., & Zheng, G. (2023). Sentiment Knowledge Enhanced Self-supervised Learning for Multimodal Sentiment Analysis. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 12966–12978). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.821
Mendeley helps you to discover research relevant for your work.