Human action recognition with 3D convolution skip-connections and RNNs

3Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper proposes a novel network architecture for human action recognition. First, we employ a pre-trained spatio-temporal feature extractor to perform spatio-temporal features extraction on videos. Then, several-level spatio-temporal features are concatenated by 3D convolution skip-connections. Moreover, a batch normalization layer is applied to normalize the concatenated features. Subsequently, we feed these normalized features into a RNN architecture to model temporal dependencies, which enables our network to deal with long-term information. In addition, we divide each video into three parts in which each part is split into non-overlapping 16-frame clips to achieve data augmentation. Finally, the proposed method is evaluated on UCF101 Dataset and is compared with existing excellent methods. Experimental results demonstrate that our method achieves the highest recognition accuracy.

Cite

CITATION STYLE

APA

Song, J., Yang, Z., Zhang, Q., Fang, T., Hu, G., Han, J., & Chen, C. (2018). Human action recognition with 3D convolution skip-connections and RNNs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11301 LNCS, pp. 319–331). Springer Verlag. https://doi.org/10.1007/978-3-030-04167-0_29

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free