Semantic keypoint-based pose estimation from single RGB frames

  • Schmeckpeper K
  • Osteen P
  • Wang Y
  • et al.
N/ACitations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

This paper presents an approach to estimating the continuous 6-DoF pose of an object from a single RGB image. The approach combines semantic keypoints predicted by a convolutional network (convnet) with a deformable shape model. Unlike prior investigators, we are agnostic to whether the object is textured or textureless, as the convnet learns the optimal representation from the available training-image data. Furthermore, the approach can be applied to instanceand class-based pose recovery. Additionally, we accompany our main pipeline with a technique for semi-automatic data generation from unlabeled videos. This procedure allows us to train the learnable components of our method with minimal manual intervention in the labeling process. Empirically, we show that our approach can accurately recover the 6-DoF object pose for both instance- and class-based scenarios even against a cluttered background. We apply our approach both to several, existing, large-scale datasets - including PASCAL3D+, LineMOD-Occluded, YCB-Video, and TUD-Light - and, using our labeling pipeline, to a new dataset with novel object classes that we introduce here. Extensive empirical evaluations show that our approach is able to provide pose estimation results comparable to the state of the art.

Cite

CITATION STYLE

APA

Schmeckpeper, K., Osteen, P., Wang, Y., Pavlakos, G., Chaney, K., Jordan, W., … Daniilidis, K. (2022). Semantic keypoint-based pose estimation from single RGB frames. Field Robotics, 2(1), 147–171. https://doi.org/10.55417/fr.2022006

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free