This paper presents a new GAN-based deep learning framework for estimating absolute scale aware depth and ego motion from monocular images using a completely unsupervised mode of learning. The proposed architecture uses two separate generators to learn the distribution of depth and pose data for a given input image sequence. The depth and pose data, thus generated, are then evaluated by a patch-based discriminator using the reconstructed image and its corresponding actual image. The patch-based GAN (or PatchGAN) is shown to detect high frequency local structural defects in the reconstructed image, thereby improving the accuracy of overall depth and pose estimation. Unlike conventional GANs, the proposed architecture uses a conditioned version of input and output of the generator for training the whole network. The resulting framework is shown to outperform all existing deep networks in this field, beating the current state-of-the-art method by 8.7% in absolute error and 5.2% in RMSE metric. To the best of our knowledge, this is first deep network based model to estimate both depth and pose simultaneously using a conditional patch-based GAN paradigm. The efficacy of the proposed approach is demonstrated through rigorous ablation studies and exhaustive performance comparison on the popular KITTI outdoor driving dataset.
CITATION STYLE
Vankadari, M., Kumar, S., Majumder, A., & Das, K. (2019). Unsupervised learning of monocular depth and ego-motion using conditional patchGANs. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 5677–5684). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/787
Mendeley helps you to discover research relevant for your work.