Scale-Aware Visual-Inertial Depth Estimation and Odometry Using Monocular Self-Supervised Learning

6Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

For real-world applications with a single monocular camera, scale ambiguity is an important issue. Because self-supervised data-driven approaches that do not require additional data containing scale information cannot avoid the scale ambiguity, state-of-the-art deep-learning-based methods address this issue by learning the scale information from additional sensor measurements. In that regard, inertial measurement unit (IMU) is a popular sensor for various mobile platforms due to its lightweight and inexpensiveness. However, unlike supervised learning that can learn the scale from the ground-truth information, learning the scale from IMU is challenging in a self-supervised setting. We propose a scale-aware monocular visual-inertial depth estimation and odometry method with end-to-end training. To learn the scale from the IMU measurements with end-to-end training in the monocular self-supervised setup, we propose a new loss function named as preintegration loss function, which trains scale-aware ego-motion by comparing the ego-motion integrated from IMU measurement and predicted ego-motion. Since the gravity and the bias should be compensated to obtain the ego-motion by integrating IMU measurements, we design a network to predict the gravity and the bias in addition to the ego-motion and the depth map. The overall performance of the proposed method is compared to state-of-the-art methods in the popular outdoor driving dataset, i.e., KITTI dataset, and the author-collected indoor driving dataset. In the KITTI dataset, the proposed method shows competitive performance compared with state-of-the-art monocular depth estimation and odometry methods, i.e., root-mean-square error of 5.435 m in the KITTI Eigen split and absolute trajectory error of 22.46 m and 0.2975 degrees in the KITTI odometry 09 sequence. Different from other up-to-scale monocular methods, the proposed method can estimate the metric-scaled depth and camera poses. Additional experiments on the author-collected indoor driving dataset qualitatively confirm the accurate performance of metric-depth and metric pose estimations.

Cite

CITATION STYLE

APA

Lee, C., Kim, C., Kim, P., Lee, H., & Jin Kim, H. (2023). Scale-Aware Visual-Inertial Depth Estimation and Odometry Using Monocular Self-Supervised Learning. IEEE Access, 11, 24087–24102. https://doi.org/10.1109/ACCESS.2023.3252884

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free