In recent years, human action recognition based on skeleton information has recently drawn increasing attention with published large-scale skeleton datasets. The most crucial factors for this task line in two aspects: the intra-frame representation for joint co-occurrences and the inter-frame representation for skeletons' temporal evolution. The most effective ways focus on spontaneous feature extraction by using deep learning. However, they ignore the structure information of skeleton joints and the correlation between two different skeleton joints for human action recognition. In this paper, we do not simply treat the joints position information as unordered points. Instead, we propose a novel data reorganizing strategy to represent the global and local structure information of human skeleton joints. Meanwhile, we also employ the data mirror to increase the relationship between skeleton joints. Based on this design, we proposed an end-to-end multi-dimensional CNN network (SRNet) to fully consider the spatial and temporal information to learn the feature extraction transform function. Specifically, in this CNN network, we employ different convolution kernels on different dimensions to learn skeleton representation to make the most of human structural information to generate robust features. Finally, we compare with other state-of-the-art on action recognition datasets like NTU RGB+D, PKU-MMD, SYSU, UT-Kinect, and HDM05. The experimental results also demonstrate the superiority of our method.
CITATION STYLE
Nie, W., Wang, W., & Huang, X. (2019). SRNet: Structured Relevance Feature Learning Network from Skeleton Data for Human Action Recognition. IEEE Access, 7, 132161–132172. https://doi.org/10.1109/ACCESS.2019.2940281
Mendeley helps you to discover research relevant for your work.