Towards Accurate Human Pose Estimation in Videos of Crowded Scenes

13Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Video-based human pose estimation in crowed scenes is a challenging problem due to occlusion, motion blur, scale variation and viewpoint change, etc. Prior approaches always fail to deal with this problem because of (1) lacking of usage of temporal information; (2) lacking of training data in crowded scenes. In this paper, we focus on improving human pose estimation in videos of crowded scenes from the perspectives of exploiting temporal context and collecting new data. In particular, we first follow the top-down strategy to detect persons and perform single-person pose estimation for each frame. Then, we refine the frame-based pose estimation with temporal contexts deriving from the optical-flow. Specifically, for one frame, we forward the historical poses from the previous frames and backward the future poses from the subsequent frames to current frame, leading to stable and accurate human pose estimation in videos. In addition, we mine new data of similar scenes to HIE dataset from the Internet for improving the diversity of training set. In this way, our model achieves best performance on 7 out of 13 videos and 56.33 average wAP on test dataset of HIE challenge.

Cite

CITATION STYLE

APA

Chang, S., Yuan, L., Nie, X., Huang, Z., Zhou, Y., Chen, Y., … Yan, S. (2020). Towards Accurate Human Pose Estimation in Videos of Crowded Scenes. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 4630–4634). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3416299

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free