3D human pose and shape estimation (3D-HPSE) from video aims to generate sequence of 3D mesh that depict human body in the video. Current deep learning based 3D-HPSE networks that takes video input have focused on improving temporal consistency among sequence of 3D joints by supervising acceleration error between predicted and ground-truth human motion. However, these methods overlooked the geometric misalignments of persistent discrepancy between geometric paths drawn by sequence of predicted joints and that of ground-truth joints. To this end, we propose Joint Path Alignment (JPA) framework, a model-agnostic approach that mitigates geometric misalignments by introducing Temporal Procrustes Alignment Regularization (TPAR) loss that performs group-wise sequence learning of joint movement paths. Unlike previous methods that rely solely on per-frame supervision for accuracy, our framework adds sequence-level accuracy supervision with TPAR loss by performing Procrustes analysis on the geometric paths drawn by sequences of predicted joints. Our experiments show that JPA framework advances the network to exceed the previous state-of-the-art performances on benchmark datasets in both per-frame accuracy and video smoothness metric.
CITATION STYLE
Hong, J. W., Yoon, S., Kim, J., & Yoo, C. D. (2023). Joint Path Alignment Framework for 3D Human Pose and Shape Estimation From Video. IEEE Access, 11, 43267–43275. https://doi.org/10.1109/ACCESS.2023.3271285
Mendeley helps you to discover research relevant for your work.