In this paper, we address a specific use-case of wearable or hand-held camera technology: indoor navigation. We explore the possibility of crowd-sourcing navigational data in the form of video sequences that are captured from wearable or hand-held cameras. Without using geometric inference techniques (such as SLAM), we test video data for navigational content, and algorithms for extracting that content. We do not include tracking in this evaluation; our purpose is to explore the hypothesis that visual content, on its own, contains cues that can be mined to infer a person's location. We test this hypothesis through estimating positional error distributions inferred during one journey with respect to other journeys along the same approximate path. The contributions of this work are threefold. First, we propose alternative methods for video feature extraction that identify candidate matches between query sequences and a database of sequences from journeys made at different times. Secondly, we suggest an evaluation methodology that estimates the error distributions in inferred position with respect to a ground truth. We assess and compare standard approaches from the field of image retrieval, such as SIFT and HOG3D, to establish associations between frames. The final contribution is a publicly available database comprising over 90,000 frames of video-sequences with positional ground-truth. The data was acquired along more than 3 km worth of indoor journeys with a hand-held device (Nexus 4) and a wearable device (Google Glass).
CITATION STYLE
Rivera-Rubio, J., Alexiou, I., Bharath, A., Secoli, R., Dickens, L., & Lupu, E. C. (2014). Associating locations from wearable cameras. In BMVC 2014 - Proceedings of the British Machine Vision Conference 2014. British Machine Vision Association, BMVA. https://doi.org/10.5244/c.28.35
Mendeley helps you to discover research relevant for your work.