We present a novel multi-modal fusion framework for non-sequential person detection, localization and identification from multiple views. Our goal is independent processing of randomly-accessed sections of video, either individual frames or small batches thereof. This way, we aim to limit the error propagation that makes the existing approaches unsuitable for fully-autonomous tracking of multiple people in long video sequences. Our framework uses one or more trained classifiers to fuse multiple weak feature maps. We perform experimental validation on a challenging dataset, demonstrating how the framework can, depending on the provided feature maps, be used either only to improve generic person detection, or enable simultaneous detection and recognition of individuals. Finally, we show that tracking-by-identification using the output of the proposed framework outperforms the state-of-the-art identification-by-tracking approach in terms of preserved track identities. © 2013 Springer-Verlag.
CITATION STYLE
Mandeljc, R., Kovačič, S., Kristan, M., & Perš, J. (2013). Non-sequential multi-view detection, localization and identification of people using multi-modal feature maps. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7726 LNCS, pp. 691–704). https://doi.org/10.1007/978-3-642-37431-9_53
Mendeley helps you to discover research relevant for your work.