Skeleton-based human action recognition (HAR) has attracted a lot of research attentions because of robustness to variations of locations and appearances. However, most existing methods treat the whole skeleton as a fixed pattern, in which the importance of different skeleton joints for action recognition is not considered. In this paper, a novel CNN-based attention-ware network is proposed. First, to describe the semantic meaning of skeletons and learn the discriminative joints over time, an attention generate network named Global Attention Network (GAN) is proposed to generate attention masks. Then, to encode the spatial structure of skeleton sequences, we design a tree-based traversal (TTTM) rule, which can represent the skeleton structure, as a convolution unit of main network. Finally, the GAN and main network are cascaded as a whole network which is trained in an end-to-end manner. Experiments show that the TTTM and GAN are supplemented each other, and the whole network achieves an efficient improvement over the state-of-the-arts, e.g., the classification accuracy of this network was 83.6% and 89.5% on NTU-RGBD CV and CS dataset, which outperforms any other methods.
CITATION STYLE
Ding, R., Liu, C., & Liu, H. (2018). An attention-aware model for human action recognition on tree-based skeleton sequences. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11357 LNAI, pp. 569–579). Springer Verlag. https://doi.org/10.1007/978-3-030-05204-1_56
Mendeley helps you to discover research relevant for your work.