Counting and localization of people in videos consisting of low density to high density crowds encounter many key challenges including complex backgrounds, scale variations, nonuniform distributions, and occlusions. For this purpose, we propose a scale driven convolutional neural network (SD-CNN) model, which is based on the assumption that heads are the dominant and visible features regardless of the density of crowds. To deal with the problem of different scales of heads in different regions of the videos, we annotate a set of heads in random locations of the videos to develop a scale map representing the mapping of head sizes. We then extract scale aware proposals based on the scale map which are fed to the SD-CNN model acting as a head detector. Our model provides a response matrix rendering accurate head positions via nonmaximal suppression. For experimental evaluations, we consider three standard datasets presenting low density to high density crowd scenes. Our proposed SD-CNN model outperforms the state-of-the-art methods in terms of both frame-level and pixel-level analyses.
CITATION STYLE
Basalamah, S., Khan, S. D., & Ullah, H. (2019). Scale Driven Convolutional Neural Network Model for People Counting and Localization in Crowd Scenes. IEEE Access, 7, 71576–71584. https://doi.org/10.1109/ACCESS.2019.2918650
Mendeley helps you to discover research relevant for your work.