Abstract
Convolutional neural networks (CNNs) can effectively handle grid-structured data but not dynamic skeletons, which are usually expressed as graph structures. In this study, we first propose a skeleton-based square grid (SSG) for transforming dynamic skeletons into three-dimensional (3D) grid-structured data so that CNNs can be applied to such data. Each SSG contains a joint-based square grid (JSG) and a rigid-based square grid (RSG) based on intrinsic and extrinsic dependencies of various body parts, respectively. Next, to enhance the ability of deep features to capture the correlations among 3D grid-structured data, a two-stream 3D CNN is constructed to learn spatiotemporal features using the JSG and RSG sequences. Finally, we introduce a soft attention model that selectively focuses on the informative body parts in the skeleton sequences. We validate our model in terms of action recognition using three datasets: NTU RGB+D, Kinetics Motion, and SBU Kinect Interaction datasets. Our experimental results demonstrate the effectiveness of the proposed approach as well as its superior performance when compared with those of state-of-The-Art methods.
Author supplied keywords
Cite
CITATION STYLE
Ding, W., Ding, C., Li, G., & Liu, K. (2021). Skeleton-Based Square Grid for Human Action Recognition with 3D Convolutional Neural Network. IEEE Access, 9, 54078–54089. https://doi.org/10.1109/ACCESS.2021.3059650
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.