Binocular Multi-CNN System for Real-Time 3D Pose Estimation

2Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The current practical approaches for depth-aware pose estimation convert a human pose from a monocular 2D image into 3D space with a single computationally intensive convolutional neural network (CNN). This paper introduces the first open-source algorithm for binocular 3D pose estimation. It uses two separate lightweight CNNs to estimate disparity/depth information from a stereoscopic camera input. This multi-CNN fusion scheme makes it possible to perform full-depth sensing in real time on a consumer-grade laptop even if parts of the human body are invisible or occluded. Our real-time system is validated with a proof-of-concept demonstrator that is composed of two Logitech C930e webcams and a laptop equipped with Nvidia GTX1650 MaxQ GPU and Intel i7-9750H CPU. The demonstrator is able to process the input camera feeds at 30 fps and the output can be visually analyzed with a dedicated 3D pose visualizer.

Cite

CITATION STYLE

APA

Niemirepo, T. T., Viitanen, M., & Vanne, J. (2020). Binocular Multi-CNN System for Real-Time 3D Pose Estimation. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 4553–4555). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3414456

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free