Mutually reinforced spatio-temporal convolutional tube for human action recognition

9Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.

Abstract

Recent works use 3D convolutional neural networks to explore spatio-temporal information for human action recognition. However, they either ignore the correlation between spatial and temporal features or suffer from high computational cost by spatio-temporal features extraction. In this work, we propose a novel and efficient Mutually Reinforced Spatio-Temporal Convolutional Tube (MRST) for human action recognition. It decomposes 3D inputs into spatial and temporal representations, mutually enhances both of them by exploiting the interaction of spatial and temporal information and selectively emphasizes informative spatial appearance and temporal motion, meanwhile reducing the complexity of structure. Moreover, we design three types of MRSTs according to the different order of spatial and temporal information enhancement, each of which contains a spatio-temporal decomposition unit, a mutually reinforced unit and a spatio-temporal fusion unit. An end-to-end deep network, MRST-Net, is also proposed based on the MRSTs to better explore spatiotemporal information in human actions. Extensive experiments show MRST-Net yields the best performance, compared to state-of-the-art approaches.

References Powered by Scopus

Deep residual learning for image recognition

174359Citations
N/AReaders
Get full text

Going deeper with convolutions

39606Citations
N/AReaders
Get full text

Squeeze-and-Excitation Networks

25994Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Complex Human Action Recognition Using a Hierarchical Feature Reduction and Deep Learning-Based Method

37Citations
N/AReaders
Get full text

Spatiotemporal fusion in 3D CNNs: A probabilistic view

26Citations
N/AReaders
Get full text

Cross-fiber spatial-temporal co-enhanced networks for video action recognition

8Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Wu, H., Liu, J., Zha, Z. J., Chen, Z., & Sun, X. (2019). Mutually reinforced spatio-temporal convolutional tube for human action recognition. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 968–974). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/136

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 16

94%

Lecturer / Post doc 1

6%

Readers' Discipline

Tooltip

Computer Science 17

85%

Engineering 2

10%

Business, Management and Accounting 1

5%

Save time finding and organizing research with Mendeley

Sign up for free