Abstract
Finding local features that are repeatable across multiple views is a cornerstone of sparse 3D reconstruction. The classical image matching paradigm detects keypoints per-image once and for all, which can yield poorly-localized features and propagate large errors to the final geometry. In this paper, we refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views: we first adjust the initial keypoint locations prior to any geometric estimation, and subsequently refine points and camera poses as a post-processing. This refinement is robust to large detection noise and appearance changes, as it optimizes a featuremetric error based on dense features predicted by a neural network. This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors, challenging viewing conditions, and off-the-shelf deep features. Our system easily scales to large image collections, enabling pixel-perfect crowdsourced localization at scale. Our code is publicly available at github.com/cvg/pixel-perfect-sfm as an add-on to the popular SfM software COLMAP.
Cite
CITATION STYLE
Lindenberger, P., Sarlin, P. E., Larsson, V., & Pollefeys, M. (2021). Pixel-Perfect Structure-from-Motion with Featuremetric Refinement. In Proceedings of the IEEE International Conference on Computer Vision (pp. 5967–5977). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/ICCV48922.2021.00593
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.