End-to-End Background Subtraction via a Multi-Scale Spatio-Temporal Model

18Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background subtraction is an important task in computer vision. Traditional approaches usually utilize low-level visual features like color, texture, or edge to build background models. Due to the lack of deep features, they often achieve poor performance when facing complex video scenes such as illumination changes, background, or camera motions, camouflage effects and shadows. Recently, deep learning has shown to perform well in extracting deep features. To improve the robustness of background subtraction, in this paper, we propose an end-to-end multi-scale spatio-temporal (MS-ST) method which is able to extract deep features from video sequences. First, a video clip is input into a convolutional neural network for extracting multi-scale spatial features. Subsequently, to exploit the temporal information, we combine temporal sampling operations and ConvLSTM modules to extract the multi-scale temporal contextual information. Finally, the segmentation result is generated by fusing multi-scale spatio-temporal features. The experimental results on the CDnet-2014 dataset and the LASIESTA dataset demonstrate the effectiveness and superiority of the proposed method.

Cite

CITATION STYLE

APA

Yang, Y., Zhang, T., Hu, J., Xu, D., & Xie, G. (2019). End-to-End Background Subtraction via a Multi-Scale Spatio-Temporal Model. IEEE Access, 7, 97949–97958. https://doi.org/10.1109/ACCESS.2019.2930319

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free