We propose a novel architecture for learning camera poses from image sequences with an extended 2D LSTM Long Short-Term Memory. Unlike most of the previous deep learning based VO Visual Odometry methods, our model predicts the pose per frame with temporal information from image sequences by adopting a forward-backward process. In addition, we use 3D tensors as basic structures to generate spatial information. The network learns poses in a bottom-up manner by coupling local and global constraints. Experiments demonstrate that on the public KITTI benchmark dataset, our architecture outperforms the state-of-the-art end-to-end methods in term of camera motion prediction and is comparable with model-based methods. The network generalizes well on the Málaga dataset without extra training or fine-tuning.
CITATION STYLE
Xue, F., Wang, X., Wang, Q., Wang, J., & Zha, H. (2019). Visual odometry with deep bidirectional recurrent neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11859 LNCS, pp. 235–246). Springer. https://doi.org/10.1007/978-3-030-31726-3_20
Mendeley helps you to discover research relevant for your work.