This paper presents a novel approach to the task of video-based crowd counting, which can be formalized as the regression problem of learning a mapping from an input image to an output crowd density map. Convolutional neural networks (CNNs) have demonstrated striking accuracy gains in a range of computer vision tasks, including crowd counting. However, the dominant focus within the crowd counting literature has been on the single-frame case or applying CNNs to videos in a frame-by-frame fashion without leveraging motion information. This paper proposes a novel architecture that exploits the spatiotemporal information captured in a video stream by combining an optical flow pyramid with an appearance-based CNN. Extensive empirical evaluation on five public datasets comparing against numerous state-of-the-art approaches demonstrates the efficacy of the proposed architecture, with our methods reporting best results on all datasets.
CITATION STYLE
Hossain, M. A., Cannons, K., Jang, D., Cuzzolin, F., & Xu, Z. (2021). Video-Based Crowd Counting Using a Multi-scale Optical Flow Pyramid Network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12626 LNCS, pp. 3–20). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-69541-5_1
Mendeley helps you to discover research relevant for your work.