Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation

Yunlong Liu; Osamu Yoshie; Hiroshi Watanabe

Conference Proceedings

Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2023) 13847 LNCS 378-397

DOI: 10.1007/978-3-031-26293-7_23

1Citations

5Readers

Get full text

Abstract

The difficulty of semantic segmentation in computer vision has been reintroduced as a topic of interest for researchers thanks to the advancement of deep learning algorithms. This research aims into the logic of multi-modal semantic segmentation on images with two different modalities of RGB and Depth, which employs RGB-D images as input. For cross-modal calibration and fusion, this research presents a novel FFCA Module. It can achieve the goal of enhancing segmentation results by acquiring complementing information from several modalities. This module is plug-and-play compatible and can be used with existing neural networks. A multi-modal semantic segmentation network named FFCANet has been designed to test the validity, with a dual-branch encoder structure and a global context module developed using the classic combination of ResNet and DeepLabV3+ backbone. Compared with the baseline, the model used in this research has drastically improved the accuracy of the semantic segmentation task.

Cite

CITATION STYLE

APA

Liu, Y., Yoshie, O., & Watanabe, H. (2023). Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13847 LNCS, pp. 378–397). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-26293-7_23

Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation

Abstract

Cite

Register to see more suggestions