Pixel-Perfect Structure-from-Motion with Featuremetric Refinement

172Citations
Citations of this article
291Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowdsourced localization at scale. Our code is publicly available at github.com/cvg/pixel-perfect-sfm as an add-on to the popular SfM software COLMAP.

Cite

CITATION STYLE

APA

Lindenberger, P., Sarlin, P. E., Larsson, V., & Pollefeys, M. (2021). Pixel-Perfect Structure-from-Motion with Featuremetric Refinement. In Proceedings of the IEEE International Conference on Computer Vision (pp. 5967–5977). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICCV48922.2021.00593

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free