Self-supervised Multi-view Synchronization Learning for 3D Pose Estimation

3Citations
Citations of this article
32Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Current state-of-the-art methods cast monocular 3D human pose estimation as a learning problem by training neural networks on large data sets of images and corresponding skeleton poses. In contrast, we propose an approach that can exploit small annotated data sets by fine-tuning networks pre-trained via self-supervised learning on (large) unlabeled data sets. To drive such networks towards supporting 3D pose estimation during the pre-training step, we introduce a novel self-supervised feature learning task designed to focus on the 3D structure in an image. We exploit images extracted from videos captured with a multi-view camera system. The task is to classify whether two images depict two views of the same scene up to a rigid transformation. In a multi-view data set, where objects deform in a non-rigid manner, a rigid transformation occurs only between two views taken at the exact same time, i.e., when they are synchronized. We demonstrate the effectiveness of the synchronization task on the Human3.6M data set and achieve state-of-the-art results in 3D human pose estimation.

Cite

CITATION STYLE

APA

Jenni, S., & Favaro, P. (2021). Self-supervised Multi-view Synchronization Learning for 3D Pose Estimation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12626 LNCS, pp. 170–187). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-69541-5_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free