This chapter describes a technique that can geo-localize arbitrary 2D depictions of architectural sites, including drawings, paintings, and historical photographs. This is achieved by aligning the input depiction with a 3D model of the corresponding site. The task is very difficult as the appearance and scene structure in the 2D depictions can be very different from the appearance and geometry of the 3D model, e.g., due to the specific rendering style, drawing error, age, lighting, or change of seasons. In addition, we face a hard search problem: the number of possible alignments of the depiction to a set of 3D models from different architectural sites is huge. To address these issues, we develop a compact representation of complex 3D scenes. 3D models of several scenes are represented by a set of discriminative visual elements that are automatically learnt from rendered views. Similar to object detection, the set of visual elements, as well as the weights of individual features for each element, are learnt in a discriminative fashion. We show that the learnt visual elements are reliably matched in 2D depictions of the scene despite large variations in rendering style (e.g., watercolor, sketch, and historical photograph) and structural changes (e.g., missing scene parts and large occluders) of the scene. We demonstrate that the proposed approach can automatically identify the correct architectural site as well as recover an approximate viewpoint of historical photographs and paintings with respect to the 3D model of the site.
CITATION STYLE
Aubry, M., Russell, B., & Sivic, J. (2016). Visual Geo-localization of Non-photographic Depictions via 2D–3D Alignment. In Advances in Computer Vision and Pattern Recognition (pp. 255–275). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-319-25781-5_14
Mendeley helps you to discover research relevant for your work.