Motion-transformer: Self-supervised pre-training for skeleton-based action recognition

27Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.
Get full text

Abstract

With the development of deep learning, skeleton-based action recognition has achieved great progress in recent years. However, most of the current works focus on extracting more informative spatial representations of the human body, but haven't made full use of the temporal dependencies already contained in the sequence of human action. To this end, we propose a novel transformer-based model called Motion-Transformer to sufficiently capture the temporal dependencies via self-supervised pre-training on the sequence of human action. Besides, we propose to predict the motion flow of human skeletons for better learning the temporal dependencies in sequence. The pre-trained model is then fine-tuned on the task of action recognition. Experimental results on the large scale NTU RGB+D dataset shows our model is effective in modeling temporal relation, and the flow prediction pre-training is beneficial to expose the inherent dependencies in time dimensional. With this pre-training and fine-tuning paradigm, our final model outperforms previous state-of-the-art methods.

Cite

CITATION STYLE

APA

Cheng, Y. B., Chen, X., Zhang, D., & Lin, L. (2021). Motion-transformer: Self-supervised pre-training for skeleton-based action recognition. In Proceedings of the 2nd ACM International Conference on Multimedia in Asia, MMAsia 2020. Association for Computing Machinery, Inc. https://doi.org/10.1145/3444685.3446289

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free