We propose a two-stage pipeline and formulate 2D hand keypoint localization as a problem of conditional video generation. The goal is to learn a mapping function from an input depth video in the source domain to an output depth video along with 5 color marks on each fingertip by enforcing temporal consistency constraints. Next, by applying color segmentation techniques in HSV domain, we extract the center of each segmented part as 2D coordinates of fingertips on the translated video. To the best of our knowledge, this is the first work on fingertip localization on depth videos through domain adaptation. Our comparative experimental results with the state-of-the-art single-frame hand pose estimation on the challenging NYU dataset demonstrates that by exploiting temporal information, our model manifests better hand appearance consistency in video-to-video synthesis stage which leads to accurate estimations of 2D hand poses under motion blur by fast hand motion.
CITATION STYLE
Farahanipad, F., Nasr, M. S., Rezaei, M., Kamangar, F., Athitsos, V., & Huber, M. (2022). 2D Fingertip Localization on Depth Videos Using Paired Video-to-Video Translation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13599 LNCS, pp. 381–392). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-20716-7_30
Mendeley helps you to discover research relevant for your work.