Spatiotemporal Saliency Based Multi-stream Networks for Action Recognition

Zhenbing Liu; Zeya Li; Ming Zong; Wanting Ji; Ruili Wang; Yan Tian

Conference Proceedings

Spatiotemporal Saliency Based Multi-stream Networks for Action Recognition

Communications in Computer and Information Science (2020) 1180 CCIS 74-84

DOI: 10.1007/978-981-15-3651-9_8

2Citations

2Readers

Get full text

Abstract

Human action recognition is a challenging research topic since videos often contain clutter backgrounds, which impairs the performance of human action recognition. In this paper, we propose a novel spatiotemporal saliency based multi-stream ResNet for human action recognition, which combines three different streams: a spatial stream with RGB frames as input, a temporal stream with optical flow frames as input, and a spatiotemporal saliency stream with spatiotemporal saliency maps as input. The spatiotemporal saliency stream is responsible for capturing the spatiotemporal object foreground information from spatiotemporal saliency maps which are generated by a geodesic distance based video segmentation method. Such architecture can reduce the background interference in videos and provide the spatiotemporal object foreground information for human action recognition. Experimental results on UCF101 and HMDB51 datasets demonstrate that the complementary spatiotemporal information can further improve the performance of action recognition, and our proposed method obtains the competitive performance compared with the state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Liu, Z., Li, Z., Zong, M., Ji, W., Wang, R., & Tian, Y. (2020). Spatiotemporal Saliency Based Multi-stream Networks for Action Recognition. In Communications in Computer and Information Science (Vol. 1180 CCIS, pp. 74–84). Springer. https://doi.org/10.1007/978-981-15-3651-9_8

Spatiotemporal Saliency Based Multi-stream Networks for Action Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions