The current practical approaches for depth-aware pose estimation convert a human pose from a monocular 2D image into 3D space with a single computationally intensive convolutional neural network (CNN). This paper introduces the first open-source algorithm for binocular 3D pose estimation. It uses two separate lightweight CNNs to estimate disparity/depth information from a stereoscopic camera input. This multi-CNN fusion scheme makes it possible to perform full-depth sensing in real time on a consumer-grade laptop even if parts of the human body are invisible or occluded. Our real-time system is validated with a proof-of-concept demonstrator that is composed of two Logitech C930e webcams and a laptop equipped with Nvidia GTX1650 MaxQ GPU and Intel i7-9750H CPU. The demonstrator is able to process the input camera feeds at 30 fps and the output can be visually analyzed with a dedicated 3D pose visualizer.
CITATION STYLE
Niemirepo, T. T., Viitanen, M., & Vanne, J. (2020). Binocular Multi-CNN System for Real-Time 3D Pose Estimation. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 4553–4555). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3414456
Mendeley helps you to discover research relevant for your work.