In this paper, we propose a deep generative model named multimodal conditional deep belief network (MCDBN) for cross-modal learning of 3D motion data and their non-injective 2D projections on the image plane. This model has a three-sectional structure, which learns conditional probability distribution of 3D motion data given 2D projections. Two distinct conditional deep belief networks (CDBNs), encode the real-valued spatiotemporal patterns of 2D and 3D motion time series captured from the subjects' movements into compact representations. The third part includes a multimodal restricted Boltzmann machine which in the training process, learns the relationship between the compact representations of data modalities by variation information criteria. As a result, conditioned on a 2D motion data obtained from a video, MCDBN can regenerate 3D motion data in the generation phase. We introduce Pearson correlation coefficient of the ground truth and the regenerated the motion signals as a new evaluation metric in motion reconstruction problems. The model is trained with human motion capture data and the results show that the real and the regenerated signals are highly correlated, which means the model can reproduce the dynamical patterns of the motion accurately.
CITATION STYLE
Heydari, M. J., & Shiry Ghidary, S. (2019). 3D Motion Reconstruction From 2D Motion Data Using Multimodal Conditional Deep Belief Network. IEEE Access, 7, 56389–56408. https://doi.org/10.1109/ACCESS.2019.2904117
Mendeley helps you to discover research relevant for your work.