An efficient 3D-NAS method for video-based gesture recognition

1Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

3D convolutional neural network (3DCNN) is a powerful and effective model utilizing spatial-temporal features, especially for gesture recognition. Unfortunately, so many parameters are modified in 3DCNN that lots of researchers choose 2DCNN or hybrid models, but these models are designed manually. In this paper, we propose a framework to automatically construct a model based on 3DCNN by network architecture search (NAS) [1]. In our method called 3DNAS, a 3D teacher network is trained from scratch as a pre-trained model to accelerate the convergence of the child networks. Then series of child networks with various architectures are generated randomly and each is trained under the direction of converted teacher model. Finally, the controller predicts a network architecture according to the rewards of all the child networks. We evaluate our method on a video-based gesture recognition dataset 20BN-Jester dataset v1 [2] and the result shows our approach is superiority against prior methods both in efficiency and accuracy.

Cite

CITATION STYLE

APA

Guo, Z., Chen, Y., Huang, W., & Zhang, J. (2019). An efficient 3D-NAS method for video-based gesture recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11729 LNCS, pp. 319–329). Springer Verlag. https://doi.org/10.1007/978-3-030-30508-6_26

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free