Graph neural networks (GNNs) have been widely used in the 3D human pose estimation task, since the pose representation of a human body can be naturally modeled by the graph structure. Generally, most of the existing GNN-based models utilize the restricted receptive fields of filters and single-scale information, while neglecting the valuable multi-scale contextual information. To tackle this issue, we propose a novel model named Graph Transformer Encoder-Decoder with Atrous Convolution (PoseGTAC), to effectively extract multi-scale context and long-range information. Specifically, our PoseGTAC model has two key components: Graph Atrous Convolution (GAC) and Graph Transformer Layer (GTL), which are respectively for the extraction of local multi-scale and global long-range information. They are combined and stacked in an encoder-decoder structure, where graph pooling and unpooling are adopted for the interaction of multi-scale information from local to global aspect (e.g., part-scale and body-scale). Extensive experiments on the Human3.6M and MPI-INF-3DHP datasets demonstrate that the proposed PoseGTAC model achieves state-of-the-art performance.
CITATION STYLE
Zhu, Y., Xu, X., Shen, F., Ji, Y., Gao, L., & Shen, H. T. (2021). PoseGTAC: Graph Transformer Encoder-Decoder with Atrous Convolution for 3D Human Pose Estimation. In IJCAI International Joint Conference on Artificial Intelligence (pp. 1359–1365). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2021/188
Mendeley helps you to discover research relevant for your work.