The estimation of viewpoints and keypoints effectively enhance object detection methods by extracting valuable traits of the object instances. While the output of both processes differ, i.e., angles vs. list of characteristic points, they indeed share the same focus on how the object is placed in the scene, inducing that there is a certain level of correlation between them. Therefore, we propose a convolutional neural network that jointly computes the viewpoint and keypoints for different object categories. By training both tasks together, each task improves the accuracy of the other. Since the labelling of object keypoints is very time consuming for human annotators, we also introduce a new synthetic dataset with automatically generated viewpoint and keypoints annotations. Our proposed network can also be trained on datasets that contain viewpoint and keypoints annotations or only one of them. The experiments show that the proposed approach successfully exploits this implicit correlation between the tasks and outperforms previous techniques that are trained independently.
CITATION STYLE
Panareda Busto, P., & Gall, J. (2019). Joint Viewpoint and Keypoint Estimation with Real and Synthetic Data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11824 LNCS, pp. 107–121). Springer. https://doi.org/10.1007/978-3-030-33676-9_8
Mendeley helps you to discover research relevant for your work.