Multi-scale spatial-temporal integration convolutional tube for human action recognition

Haoze Wu; Jiawei Liu; Xierong Zhu; Meng Wang; Zheng Jun Zha

Conference ProceedingsOPEN ACCESS

Multi-scale spatial-temporal integration convolutional tube for human action recognition

IJCAI International Joint Conference on Artificial Intelligence (2020) 2021-January 753-759

DOI: 10.24963/ijcai.2020/105

5Citations

18Readers

Abstract

Applying multi-scale representations leads to consistent performance improvements on a wide range of image recognition tasks. However, with the addition of the temporal dimension in video domain, directly obtaining layer-wise multi-scale spatial-temporal features will add a lot extra computational cost. In this work, we propose a novel and efficient Multi-Scale Spatial-Temporal Integration Convolutional Tube (MSTI) aiming at achieving accurate recognition of actions with lower computational cost. It firstly extracts multi-scale spatial and temporal features through the multi-scale convolution block. Considering the interaction of different-scales representations and the interaction of spatial appearance and temporal motion, we employ the cross-scale attention weighted blocks to perform feature recalibration by integrating multi-scale spatial and temporal features. An end-to-end deep network, MSTI-Net, is also presented based on the proposed MSTI tube for human action recognition. Extensive experimental results show that our MSTI-Net significantly boosts the performance of existing convolution networks and achieves state-of-the-art accuracy on three challenging benchmarks, i.e., UCF-101, HMDB-51 and Kinetics-400, with much fewer parameters and FLOPs.

Cite

CITATION STYLE

APA

Wu, H., Liu, J., Zhu, X., Wang, M., & Zha, Z. J. (2020). Multi-scale spatial-temporal integration convolutional tube for human action recognition. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 753–759). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/105

Multi-scale spatial-temporal integration convolutional tube for human action recognition

Abstract

Cite

Register to see more suggestions