Abstract
The difficulty of semantic segmentation in computer vision has been reintroduced as a topic of interest for researchers thanks to the advancement of deep learning algorithms. This research aims into the logic of multi-modal semantic segmentation on images with two different modalities of RGB and Depth, which employs RGB-D images as input. For cross-modal calibration and fusion, this research presents a novel FFCA Module. It can achieve the goal of enhancing segmentation results by acquiring complementing information from several modalities. This module is plug-and-play compatible and can be used with existing neural networks. A multi-modal semantic segmentation network named FFCANet has been designed to test the validity, with a dual-branch encoder structure and a global context module developed using the classic combination of ResNet and DeepLabV3+ backbone. Compared with the baseline, the model used in this research has drastically improved the accuracy of the semantic segmentation task.
Cite
CITATION STYLE
Liu, Y., Yoshie, O., & Watanabe, H. (2023). Application of Multi-modal Fusion Attention Mechanism in Semantic Segmentation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13847 LNCS, pp. 378–397). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-26293-7_23
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.