Stereo Depth Estimation with Echoes

0Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Stereo depth estimation is particularly amenable to local textured regions while echoes have good depth estimations for global textureless regions, thus the two modalities complement each other. Motivated by the reciprocal relationship between both modalities, in this paper, we propose an end-to-end framework named StereoEchoes for stereo depth estimation with echoes. A Cross-modal Volume Refinement module is designed to transfer the complementary knowledge of the audio modality to the visual modality at feature level. A Relative Depth Uncertainty Estimation module is further proposed to yield pixel-wise confidence for multimodal depth fusion at output space. As there is no dataset for this new problem, we introduce two Stereo-Echo datasets named Stereo-Replica and Stereo-Matterport3D for the first time. Remarkably, we show empirically that our StereoEchoes, on Stereo-Replica and Stereo-Matterport3D, outperforms stereo depth estimation methods by 25% / 13.8% RMSE, and surpasses the state-of-the-art audio-visual depth prediction method by 25.3% / 42.3% RMSE.

Cite

CITATION STYLE

APA

Zhang, C., Tian, K., Ni, B., Meng, G., Fan, B., Zhang, Z., & Pan, C. (2022). Stereo Depth Estimation with Echoes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13687 LNCS, pp. 496–513). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19812-0_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free