This paper focuses on 6Dof object pose estimation from a single RGB image. We tackle this challenging problem with a two-stage optimization framework. More specifically, we first introduce a translation estimation module to provide an initial translation based on an estimated depth map. Then, a pose regression module combines the ROI (Region of Interest) and the original image to predict the rotation and refine the translation. Compared with previous end-to-end methods that directly predict rotations and translations, our method can utilize depth information as weak guidance and significantly reduce the searching space for the subsequent module. Furthermore, we design a new loss function function for symmetric objects, an approach that has handled such exceptionally difficult cases in prior works. Experiments show that our model achieves state-of-the-art object pose estimation for the YCB-video dataset (Yale-CMU-Berkeley).
CITATION STYLE
Jin, L., Wang, X., He, M., & Wang, J. (2021). Drnet: A depth-based regression network for 6d object pose estimation. Sensors, 21(5), 1–15. https://doi.org/10.3390/s21051692
Mendeley helps you to discover research relevant for your work.