3M2RNet: Multi-Modal Multi-Resolution Refinement Network for Semantic Segmentation

Fahimeh Fooladgar; Shohreh Kasaei

Conference Proceedings

3M2RNet: Multi-Modal Multi-Resolution Refinement Network for Semantic Segmentation

Advances in Intelligent Systems and Computing (2020) 944 544-557

DOI: 10.1007/978-3-030-17798-0_44

2Citations

8Readers

Get full text

Abstract

One of the most important steps towards 3D scene understanding is the semantic segmentation of images. The 3D scene understanding is considered as the crucial requirement in computer vision and robotic applications. With the availability of RGB-D cameras, it is desired to improve the accuracy of the scene understanding process by exploiting the depth along with appearance features. One of the main problems in RGB-D semantic segmentation is how to fuse or combine these two modalities to achieve more advantages of the common and specific features of each modality. Recently, the methods that encounter deep convolutional neural networks have reached the state-of-the-art results in dense prediction. They are usually used as feature extractors as well as data classifiers with an end-to-end training procedure. In this paper, an efficient multi-modal multi-resolution refinement network is proposed to exploit the advantages of these modalities (RGB and depth) as much as possible. This refinement network is a type of encoder-decoder networks with two separate encoder branches and one decoder stream. The feature abstract representation of deep networks is performed by down-sampling operations in encoder branches leading to some resolution loss in data. Therefore, in the decoder branch, the occurred resolution loss must be compensated. In the modality fusion process, a weighted fusion of “clean” information paths of each resolution level of the two encoders is utilized via the skip connection by the aid of the identity mapping function. The extensive experimental results on the three main challenging datasets of NYU-V2, SUN RGB-D, and Stanford 2D-3D-S show that the proposed network obtains the state-of-the-art results.

Author supplied keywords

Cite

CITATION STYLE

APA

Fooladgar, F., & Kasaei, S. (2020). 3M2RNet: Multi-Modal Multi-Resolution Refinement Network for Semantic Segmentation. In Advances in Intelligent Systems and Computing (Vol. 944, pp. 544–557). Springer Verlag. https://doi.org/10.1007/978-3-030-17798-0_44

3M2RNet: Multi-Modal Multi-Resolution Refinement Network for Semantic Segmentation

Abstract

Author supplied keywords

Cite

Register to see more suggestions