VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification

Shangwu Hou; Gulanbaier Tuerhong; Mairidan Wushouer

Journal ArticleOPEN ACCESS

VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification

Sensors (2023) 23(2)

DOI: 10.3390/s23020661

12Citations

13Readers

Abstract

Sentiment classification is a key task in exploring people’s opinions; improved sentiment classification can help individuals make better decisions. Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in conventional social media. As a result, understanding how to fully utilize them is critical in a variety of activities, including sentiment classification. In this work, we provide a fresh multimodal sentiment classification approach: visual distillation and attention network or VisdaNet. First, this method proposes a knowledge augmentation module, which overcomes the lack of information in short text by integrating the information of image captions and short text; secondly, aimed at the information control problem in the multi-modal fusion process in the product review scene, this paper proposes a knowledge distillation based on the CLIP module to reduce the noise information of the original modalities and improve the quality of the original modal information. Finally, regarding the single-text multi-image fusion problem in the product review scene, this paper proposes visual aspect attention based on the CLIP module, which correctly models the text-image interaction relationship in special scenes and realizes feature-level fusion across modalities. The results of the experiment on the Yelp multimodal dataset reveal that our model outperforms the previous SOTA model. Furthermore, the ablation experiment results demonstrate the efficacy of various tactics in the suggested model.

Author supplied keywords

References Powered by Scopus

View more at Scopus

Cited by Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Hou, S., Tuerhong, G., & Wushouer, M. (2023). VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification. Sensors, 23(2). https://doi.org/10.3390/s23020661

Readers' Seniority

PhD / Post grad / Masters / Doc 2

50%

Professor / Associate Prof. 1

25%

Lecturer / Post doc 1

25%

Readers' Discipline

Computer Science 3

60%

Social Sciences 1

20%

Engineering 1

20%

Article Metrics

Mentions

News Mentions: 1

View details >

VisdaNet: Visual Distillation and Attention Network for Multimodal Sentiment Classification

Abstract

Author supplied keywords

References Powered by Scopus

GloVe: Global vectors for word representation

Learning phrase representations using RNN encoder-decoder for statistical machine translation

Convolutional neural networks for sentence classification

Cited by Powered by Scopus

Image generation of hazardous situations in construction sites using text-to-image generative model for training deep neural networks

A Deep Features Based Approach Using Modified ResNet50 and Gradient Boosting for Visual Sentiments Classification

UsbVisdaNet: User Behavior Visual Distillation and Attention Network for Multimodal Sentiment Classification

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline

Article Metrics