Human body skeleton, acting as a spatiotemporal graph, is increasing attentions of researchers to adopt graph convolutional networks (GCN) to mine the discriminative features from skeleton joints. However, one of GCN's flaws is its inability to handle long-distance reliance between joints. In this regard, graph attention network (GAT) was recently suggested, which combines graph convolutions with a self-attention mechanism to extract the most informative joint of a human skeleton and increase the model accuracy. However, GAT can compute only static attention: for each query node, the attention rank is same which severely hurts and limits the expressivity of an attention mechanism. In this work, we present a spatial-temporal dynamic graph attention network (ST-DGAT) to learn the spatial-temporal patterns of skeleton sequences. For dynamic graph attention, we tweak the order of weighted vector operations in GAT, our approach achieves a global approximate attention function, making it strictly superior to GAT. Experiments show that by fixing the order of internal operation of GAT the proposed model achieved better action classification results while maintaining the same computing cost as GAT. The proposed framework has been evaluated on well-known publicly available large-scale datasets NTU60, NTU120, and Kinetics-400, which notably outperforms state-of-the-art (SOTA) results with an accuracy of 96.4%, 88.2%, and 61.0%, respectively.
CITATION STYLE
Rahevar, M., Ganatra, A., Saba, T., Rehman, A., & Bahaj, S. A. (2023). Spatial-Temporal Dynamic Graph Attention Network for Skeleton-Based Action Recognition. IEEE Access, 11, 21546–21553. https://doi.org/10.1109/ACCESS.2023.3247820
Mendeley helps you to discover research relevant for your work.