Spatiotemporal interaction residual networks with pseudo3d for video action recognition

11Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.

Abstract

Action recognition is a significant and challenging topic in the field of sensor and computer vision. Two-stream convolutional neural networks (CNNs) and 3D CNNs are two mainstream deep learning architectures for video action recognition. To combine them into one framework to further improve performance, we proposed a novel deep network, named the spatiotemporal interaction residual network with pseudo3D (STINP). The STINP possesses three advantages. First, the STINP consists of two branches constructed based on residual networks (ResNets) to simultaneously learn the spatial and temporal information of the video. Second, the STINP integrates the pseudo3D block into residual units for building the spatial branch, which ensures that the spatial branch can not only learn the appearance feature of the objects and scene in the video, but also capture the potential interaction information among the consecutive frames. Finally, the STINP adopts a simple but effective multiplication operation to fuse the spatial branch and temporal branch, which guarantees that the learned spatial and temporal representation can interact with each other during the entire process of training the STINP. Experiments were implemented on two classic action recognition datasets, UCF101 and HMDB51. The experimental results show that our proposed STINP can provide better performance for video recognition than other state-of-the-art algorithms.

References Powered by Scopus

Deep residual learning for image recognition

174361Citations
N/AReaders
Get full text

ImageNet: A Large-Scale Hierarchical Image Database

51099Citations
N/AReaders
Get full text

Gradient-based learning applied to document recognition

44108Citations
N/AReaders
Get full text

Cited by Powered by Scopus

ChMusic: A Traditional Chinese Music Dataset for Evaluation of Instrument Recognition

17Citations
N/AReaders
Get full text

Human Action Recognition Research Based on Fusion TS-CNN and LSTM Networks

7Citations
N/AReaders
Get full text

Motion Guided Feature-Augmented Network for Action Recognition

5Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Chen, J., Kong, J., Sun, H., Xu, H., Liu, X., Lu, Y., & Zheng, C. (2020). Spatiotemporal interaction residual networks with pseudo3d for video action recognition. Sensors (Switzerland), 20(11). https://doi.org/10.3390/s20113126

Readers over time

‘21‘22‘2400.751.52.253

Readers' Seniority

Tooltip

Researcher 3

60%

PhD / Post grad / Masters / Doc 2

40%

Readers' Discipline

Tooltip

Computer Science 2

67%

Arts and Humanities 1

33%

Save time finding and organizing research with Mendeley

Sign up for free
0