Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition

Jie Guo; Xiushan Nie; Yilong Yin

Journal ArticleOPEN ACCESS

Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition

IEEE Access (2020) 8 29518-29524

DOI: 10.1109/ACCESS.2020.2973240

9Citations

6Readers

Abstract

Scene recognition is one of the hot topics in micro-video understanding, where multi-modal information is commonly used due to its efficient representation ability. However, there are some challenges in the usage of multi-modal information because the semantic consistency among multiple modalities in micro-videos is weaker than in traditional videos, and the influences of multi-modal information in micro-videos are always different. To address these issues, a multi-modal enhancement semantic learning method is proposed for micro-video scene recognition in this study. In the proposed method, the visual modality is considered the main modality whereas other modalities such as text and audio are considered auxiliary modalities. We propose a deep multi-modal fusion network for scene recognition with enhanced the semantics of auxiliary modalities using the main modality. Furthermore, the fusion weight of multi-modal can be adaptively learned in the proposed method. The experiments demonstrate the effectiveness of enhancement and adaptive weight learning in the multi-modal fusion of the micro-video scene recognition.

Author supplied keywords

Cite

CITATION STYLE

APA

Guo, J., Nie, X., & Yin, Y. (2020). Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition. IEEE Access, 8, 29518–29524. https://doi.org/10.1109/ACCESS.2020.2973240

Mutual Complementarity: Multi-Modal Enhancement Semantic Learning for Micro-Video Scene Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions