Cross-view people tracking by scene-centered spatio-temporal parsing

61Citations
Citations of this article
37Readers
Mendeley users who have this article in their library.

Abstract

In this paper, we propose a Spatio-temporal Attributed Parse Graph (ST-APG) to integrate semantic attributes with trajectories for cross-view people tracking. Given videos from multiple cameras with overlapping field of view (FOV), our goal is to parse the videos and organize the trajectories of all targets into a scene-centered representation. We leverage rich semantic attributes of human, e.g., facing directions, postures and actions, to enhance cross-view tracklet associations, besides frequently used appearance and geometry features in the literature. In particular, the facing direction of a human in 3D, once detected, often coincides with his/her moving direction or trajectory. Similarly, the actions of humans, once recognized, provide strong cues for distinguishing one subject from the others. The inference is solved by iteratively grouping tracklets with cluster sampling and estimating people semantic attributes by dynamic programming. In experiments, we validate our method on one public dataset and create another new dataset that records people's daily life in public, e.g., food court, office reception and plaza, each of which includes 3-4 cameras. We evaluate the proposed method on these challenging videos and achieve promising multi-view tracking results.

Cite

CITATION STYLE

APA

Xu, Y., Liu, X., Qin, L., & Zhu, S. C. (2017). Cross-view people tracking by scene-centered spatio-temporal parsing. In 31st AAAI Conference on Artificial Intelligence, AAAI 2017 (pp. 4299–4305). AAAI press. https://doi.org/10.1609/aaai.v31i1.11190

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free