Self-supervised depth estimation has shown great prospects in inferring 3D structures using purely unannotated images. However, its performance usually drops when trained on the images with changing brightness and moving objects. In this paper, we address this issue by enhancing the robustness of the self-supervised paradigm using a set of image-based and geometry-based constraints. Our contributions are threefold, 1) we propose a gradient-based robust photometric loss which restrains the false supervisory signals caused by brightness changes, 2) we propose to filter out the unreliable areas that violate the rigid assumption by a novel combined selective mask, which is computed on the forward pass of the network by leveraging the inter-loss consistency and the loss-gradient consistency, and 3) we constrain the motion estimation network to generate across-frame consistent motions via proposing a triplet-based cycle consistency constraint. Extensive experiments conducted on KITTI, Cityscape and Make3D datasets demonstrate the superiority of our method, that the proposed method can effectively handle complex scenes with changing brightness and object motions. Both qualitative and quantitative results show that the proposed method outperforms the state-of-the-art methods.
CITATION STYLE
Li, R., He, X., Zhu, Y., Li, X., Sun, J., & Zhang, Y. (2020). Enhancing Self-supervised Monocular Depth Estimation via Incorporating Robust Constraints. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 3108–3117). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413706
Mendeley helps you to discover research relevant for your work.