Gated 3D-CNN for Action Recognition

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Human action recognition is an active field in computer vision tasks. It is mostly based on the extensively developed image recognition algorithm using convolutional neural networks(CNNs) or recurrent neural networks (RNNs). Action recognition is considered as a more challenging task than image recognition as a video consists of an image sequence that changes in every frame, and the model has to deal with both spatial and temporal information simultaneously. Recently proposed methods using the two-stream fusion technique show good performance in such tasks. However, these methods are computationally expensive and are complex to build for learning spatio-temporal dependencies of the action. This paper proposes a simple yet efficient deep neural network architecture, Gated 3D-CNN, consisting of 3D convolutional layers and gating modules to act as an LSTM model for learning spatial and temporal dependencies and give attention to essential features. The proposed method first learns spatial and temporal features of actions through 3D-CNN. Then, the sigmoid gated 3D convolution layers of local and global gating help to locate attention to the essential features of the action. The proposed architecture is comparatively simpler to implement and gives a competitive performance on the UFC-101 dataset.

Cite

CITATION STYLE

APA

Shrestha, L., Dubey, S., Olimov, F., & Jeon, M. (2022). Gated 3D-CNN for Action Recognition. In Communications in Computer and Information Science (Vol. 1716 CCIS, pp. 556–565). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-8234-7_43

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free