Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to Smart Edge Sensors

15Citations
Citations of this article
40Readers
Mendeley users who have this article in their library.

Abstract

We present a novel method for estimation of 3D human poses from a multi-camera setup, employing distributed smart edge sensors coupled with a backend through a semantic feedback loop. 2D joint detection for each camera view is performed locally on a dedicated embedded inference processor. Only the semantic skeleton representation is transmitted over the network and raw images remain on the sensor board. 3D poses are recovered from 2D joints on a central backend, based on triangulation and a body model which incorporates prior knowledge of the human skeleton. A feedback channel from backend to individual sensors is implemented on a semantic level. The allocentric 3D pose is backprojected into the sensor views where it is fused with 2D joint detections. The local semantic model on each sensor can thus be improved by incorporating global context information. The whole pipeline is capable of realtime operation. We evaluate our method on three public datasets, where we achieve state-of-the-art results and show the benefits of our feedback architecture, as well as in our own setup for multi-person experiments. Using the feedback signal improves the 2D joint detections and in turn the estimated 3D poses.

Cite

CITATION STYLE

APA

Bultmann, S., & Behnke, S. (2021). Real-Time Multi-View 3D Human Pose Estimation using Semantic Feedback to Smart Edge Sensors. In Robotics: Science and Systems. MIT Press Journals. https://doi.org/10.15607/RSS.2021.XVII.040

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free