Spatiotemporal Saliency Based Multi-stream Networks for Action Recognition

2Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Human action recognition is a challenging research topic since videos often contain clutter backgrounds, which impairs the performance of human action recognition. In this paper, we propose a novel spatiotemporal saliency based multi-stream ResNet for human action recognition, which combines three different streams: a spatial stream with RGB frames as input, a temporal stream with optical flow frames as input, and a spatiotemporal saliency stream with spatiotemporal saliency maps as input. The spatiotemporal saliency stream is responsible for capturing the spatiotemporal object foreground information from spatiotemporal saliency maps which are generated by a geodesic distance based video segmentation method. Such architecture can reduce the background interference in videos and provide the spatiotemporal object foreground information for human action recognition. Experimental results on UCF101 and HMDB51 datasets demonstrate that the complementary spatiotemporal information can further improve the performance of action recognition, and our proposed method obtains the competitive performance compared with the state-of-the-art methods.

Cite

CITATION STYLE

APA

Liu, Z., Li, Z., Zong, M., Ji, W., Wang, R., & Tian, Y. (2020). Spatiotemporal Saliency Based Multi-stream Networks for Action Recognition. In Communications in Computer and Information Science (Vol. 1180 CCIS, pp. 74–84). Springer. https://doi.org/10.1007/978-981-15-3651-9_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free