3D convolutional neural network (3DCNN) is a powerful and effective model utilizing spatial-temporal features, especially for gesture recognition. Unfortunately, so many parameters are modified in 3DCNN that lots of researchers choose 2DCNN or hybrid models, but these models are designed manually. In this paper, we propose a framework to automatically construct a model based on 3DCNN by network architecture search (NAS) [1]. In our method called 3DNAS, a 3D teacher network is trained from scratch as a pre-trained model to accelerate the convergence of the child networks. Then series of child networks with various architectures are generated randomly and each is trained under the direction of converted teacher model. Finally, the controller predicts a network architecture according to the rewards of all the child networks. We evaluate our method on a video-based gesture recognition dataset 20BN-Jester dataset v1 [2] and the result shows our approach is superiority against prior methods both in efficiency and accuracy.
CITATION STYLE
Guo, Z., Chen, Y., Huang, W., & Zhang, J. (2019). An efficient 3D-NAS method for video-based gesture recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11729 LNCS, pp. 319–329). Springer Verlag. https://doi.org/10.1007/978-3-030-30508-6_26
Mendeley helps you to discover research relevant for your work.