People counting in videos by fusing temporal cues from spatial context-aware convolutional neural networks

N/ACitations
Citations of this article
10Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We present an efficient method for people counting in video sequences from fixed cameras by utilising the responses of spatially context-aware convolutional neural networks (CNN) in the temporal domain. For stationary cameras, the background information remains fairly static, while foreground characteristics, such as size and orientation may depend on their image location, thus the use of whole frames for training a CNN improves the differentiation between background and foreground pixels. Foreground density representing the presence of people in the environment can then be associated with people counts. Moreover the fusion, of the responses of count estimations, in the temporal domain, can further enhance the accuracy of the final count. Our methodology was tested using the publicly available Mall dataset and achieved a mean deviation error of 0.091.

Cite

CITATION STYLE

APA

Sourtzinos, P., Velastin, S. A., Jara, M., Zegers, P., & Makris, D. (2016). People counting in videos by fusing temporal cues from spatial context-aware convolutional neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9914 LNCS, pp. 655–667). Springer Verlag. https://doi.org/10.1007/978-3-319-48881-3_46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free