Monocular depth estimation is an ill-posed problem because infinite 3D scenes can be projected to the same 2D scenes. Most recent methods focus on image-level information from deep convolutional neural networks, while training them may suffer from slow convergence and accuracy degeneration, especially for deeper network and more feature channels. Based on an encoder-decoder framework, we propose a novel Residual DenseASPP Network. In our Residual DenseASPP network, we define features as low/mid/high vision features and use two-kinds of skip connection to learn useful features with certain layers, where feature concentration in the dense block is used to generate more features in the same layer, and feature summation in the residual block is used to increase backward gradient. The experimental results show that high vision features require more channels by feature concentration, while low/mid vision features need better convergence by feature summation. Experiments show that our proposed approach achieves state-of-the-art performance on both NYUv2 and Make3D datasets.
CITATION STYLE
Wu, K., Zhang, S., & Xie, Z. (2020). Monocular Depth Prediction with Residual DenseASPP Network. IEEE Access, 8, 129899–129910. https://doi.org/10.1109/ACCESS.2020.3006704
Mendeley helps you to discover research relevant for your work.