Recently, human activity recognition using skeleton data is increasing due to its ease of acquisition and finer shape details. Still, it suffers from a wide range of intra-class variation, inter-class similarity among the actions and view variation due to which extraction of discriminative spatial and temporal features is still a challenging problem. In this regard, we present a novel Residual Inception Attention Driven CNN (RIAC-Net) Network, which visualizes the dynamics of the action in a part-wise manner. The complete skeletonis partitioned into five key parts: Head to Spine, Left Leg, Right Leg, Left Hand, Right Hand. For each part, a Compact Action Skeleton Sequence (CASS) is defined. Part-wise skeleton-based motion dynamics highlights discriminative local features of the skeleton that helps to overcome the challenges of inter-class similarity and intra-class variation with improved recognition performance. The RIAC-Net architecture is inspired by the concept of inception-residual representation that unifies the Attention Driven Residues (ADR) with inception-based Spatiooral Convolution Features (STCF) to learn efficient salient action features. An ablation study is also carried out to analyze the effect of ADR over simple residue-based action representation. The robustness of the proposed framework is evaluated by performing an extensive experiment on four challenging datasets: UT Kinect Action 3D, Florence 3D action, MSR Daily Action3D, and NTU RGB-D datasets, which consistently demonstrate the superiority of the proposed method over other state-of-the-art methods.
CITATION STYLE
Dhiman, C., Vishwakarma, D. K., & Agarwal, P. (2021). Part-wise Spatiooral Attention Driven CNN-based 3D Human Action Recognition. ACM Transactions on Multimedia Computing, Communications and Applications, 17(3). https://doi.org/10.1145/3441628
Mendeley helps you to discover research relevant for your work.