Depictions of similar human body configurations can vary with changing viewpoints. Using only 2D information, we would like to enable vision algorithms to recognize similarity in human body poses across multiple views. This ability is useful for analyzing body movements and human behaviors in images and videos. In this paper, we propose an approach for learning a compact view-invariant embedding space from 2D joint keypoints alone, without explicitly predicting 3D poses. Since 2D poses are projected from 3D space, they have an inherent ambiguity, which is difficult to represent through a deterministic mapping. Hence, we use probabilistic embeddings to model this input uncertainty. Experimental results show that our embedding model achieves higher accuracy when retrieving similar poses across different camera views, in comparison with 2D-to-3D pose lifting models. We also demonstrate the effectiveness of applying our embeddings to view-invariant action recognition and video alignment. Our code is available at https://github.com/google-research/google-research/tree/master/poem.
CITATION STYLE
Sun, J. J., Zhao, J., Chen, L. C., Schroff, F., Adam, H., & Liu, T. (2020). View-Invariant Probabilistic Embedding for Human Pose. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12350 LNCS, pp. 53–70). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-58558-7_4
Mendeley helps you to discover research relevant for your work.