Human video prediction is still a challenging problem due to the uncertainty of future actions and complexity of frame details. Recent methods tackle this problem in two steps: firstly to forecast future human poses from the initial ones, and then to generate realistic frames conditioned on predicted poses. Following this framework, we propose a novel Graph Convolutional Network (GCN) based pose predictor to comprehensively model human body joints and forcast their positions holistically, and also a stacked generative model with a temporal discriminator to iteratively refine the quality of the generated videos. The GCN based pose predictor fully considers the relationships among body joints and produces more plausible pose predictions. With the guidance of predicted poses, a temporal discriminator encodes temporal information into future frame generation to achieve high-quality results. Furthermore, stacked residual refinement generators make the results more realistic. Extensive experiments on benchmark datasets demonstrate that the proposed method produces better predictions than state-of-the-arts and achieves up to 15% improvement in PSNR.
CITATION STYLE
Zhao, Y., & Dou, Y. (2020). Pose-Forecasting Aided Human Video Prediction with Graph Convolutional Networks. IEEE Access, 8, 147256–147264. https://doi.org/10.1109/ACCESS.2020.2995383
Mendeley helps you to discover research relevant for your work.