Spatio-temporal attention model for foreground detection in cross-scene surveillance videos

17Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Foreground detection is an important theme in video surveillance. Conventional background modeling approaches build sophisticated temporal statistical model to detect foreground based on low-level features, while modern semantic/instance segmentation approaches generate high-level foreground annotation, but ignore the temporal relevance among consecutive frames. In this paper, we propose a Spatio-Temporal Attention Model (STAM) for cross-scene foreground detection. To fill the semantic gap between low and high level features, appearance and optical flow features are synthesized by attention modules via the feature learning procedure. Experimental results on CDnet 2014 benchmarks validate it and outperformed many state-of-the-art methods in seven evaluation metrics. With the attention modules and optical flow, its F-measure increased 9% and 6% respectively. The model without any tuning showed its cross-scene generalization on Wallflower and PETS datasets. The processing speed was 10.8 fps with the frame size 256 by 256.

Cite

CITATION STYLE

APA

Liang, D., Pan, J., Sun, H., & Zhou, H. (2019). Spatio-temporal attention model for foreground detection in cross-scene surveillance videos. Sensors (Switzerland), 19(23). https://doi.org/10.3390/s19235142

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free