VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification

12Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in conventional social media. As a result, understanding how to fully utilize them is critical in a variety of activities, including sentiment classification. In this work, we provide a fresh multimodal sentiment classification approach: visual distillation and attention network or VisdaNet. First, this method proposes a knowledge augmentation module, which overcomes the lack of information in short text by integrating the information of image captions and short text; secondly, aimed at the information control problem in the multi-modal fusion process in the product review scene, this paper proposes a knowledge distillation based on the CLIP module to reduce the noise information of the original modalities and improve the quality of the original modal information. Finally, regarding the single-text multi-image fusion problem in the product review scene, this paper proposes visual aspect attention based on the CLIP module, which correctly models the text-image interaction relationship in special scenes and realizes feature-level fusion across modalities. The results of the experiment on the Yelp multimodal dataset reveal that our model outperforms the previous SOTA model. Furthermore, the ablation experiment results demonstrate the efficacy of various tactics in the suggested model.

References Powered by Scopus

GloVe: Global vectors for word representation

27033Citations
N/AReaders
Get full text

Learning phrase representations using RNN encoder-decoder for statistical machine translation

11713Citations
N/AReaders
Get full text

Convolutional neural networks for sentence classification

8067Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Image generation of hazardous situations in construction sites using text-to-image generative model for training deep neural networks

4Citations
N/AReaders
Get full text

A Deep Features Based Approach Using Modified ResNet50 and Gradient Boosting for Visual Sentiments Classification

2Citations
N/AReaders
Get full text

UsbVisdaNet: User Behavior Visual Distillation and Attention Network for Multimodal Sentiment Classification

2Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Hou, S., Tuerhong, G., & Wushouer, M. (2023). VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification. Sensors, 23(2). https://doi.org/10.3390/s23020661

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 2

50%

Professor / Associate Prof. 1

25%

Lecturer / Post doc 1

25%

Readers' Discipline

Tooltip

Computer Science 3

60%

Social Sciences 1

20%

Engineering 1

20%

Article Metrics

Tooltip
Mentions
News Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free