DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation

42Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Indoor scene segmentation is a difficult task in computer vision. We propose an indoor scene segmentation framework, called DFMNet, incorporating RGB and complementary depth information to establish indoor scene segmentation. We use the squeeze-and-excitation residual network as encoder to simultaneously extract features from RGB and depth data and fuse them in the decoder. Multiple average pooling layers and transposed convolution layers are used to process the encoded outputs and fuse their outputs over several decoder layers. To optimize the network parameters, we use a pyramid supervision training scheme, which applies supervised learning over different layers in the decoder to prevent vanishing gradients. We evaluated the proposed DFMNet on the NYU Depth V2 dataset, which consists of 1449 cluttered indoor scenes, achieving competitive results compared to state-of-the-art methods.

Cite

CITATION STYLE

APA

Yuan, J., Zhou, W., & Luo, T. (2019). DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation. IEEE Access, 7, 169350–169358. https://doi.org/10.1109/ACCESS.2019.2955101

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free