Learning Three-dimensional Skeleton Data from Sign Language Video

19Citations
Citations of this article
45Readers
Mendeley users who have this article in their library.

Abstract

Data for sign language research is often difficult and costly to acquire. We therefore present a novel pipeline able to generate motion three-dimensional (3D) skeleton data from single-camera sign language videos only. First, three recurrent neural networks are learned to infer the three-dimensional position data of body, face, and finger joints for a high resolution of the signer's skeleton. Subsequently, the angular displacements of all joints over time are estimated using inverse kinematics and mapped to a virtual sign avatar for animation. Last, the generated data are evaluated in detail, including a sign language recognition and sign language synthesis scenario. Utilizing a neural word classifier trained on real motion capture data, we reliably classify word segments built from our newly generated position data with similar accuracy as motion capture data (absolute difference 3.8%). Furthermore, qualitative evaluation of sign animations shows that the avatar performs natural movements that are comprehensible and resemble animations created with original motion capture data.

Cite

CITATION STYLE

APA

Brock, H., Law, F., Nakadai, K., & Nagashima, Y. (2020). Learning Three-dimensional Skeleton Data from Sign Language Video. ACM Transactions on Intelligent Systems and Technology, 11(3). https://doi.org/10.1145/3377552

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free