Learning Three-dimensional Skeleton Data from Sign Language Video

Heike Brock; Felix Law; Kazuhiro Nakadai; Yuji Nagashima

Journal ArticleOPEN ACCESS

Learning Three-dimensional Skeleton Data from Sign Language Video

ACM Transactions on Intelligent Systems and Technology (2020) 11(3)

DOI: 10.1145/3377552

19Citations

45Readers

Abstract

Data for sign language research is often difficult and costly to acquire. We therefore present a novel pipeline able to generate motion three-dimensional (3D) skeleton data from single-camera sign language videos only. First, three recurrent neural networks are learned to infer the three-dimensional position data of body, face, and finger joints for a high resolution of the signer's skeleton. Subsequently, the angular displacements of all joints over time are estimated using inverse kinematics and mapped to a virtual sign avatar for animation. Last, the generated data are evaluated in detail, including a sign language recognition and sign language synthesis scenario. Utilizing a neural word classifier trained on real motion capture data, we reliably classify word segments built from our newly generated position data with similar accuracy as motion capture data (absolute difference 3.8%). Furthermore, qualitative evaluation of sign animations shows that the avatar performs natural movements that are comprehensible and resemble animations created with original motion capture data.

Author supplied keywords

Cite

CITATION STYLE

APA

Brock, H., Law, F., Nakadai, K., & Nagashima, Y. (2020). Learning Three-dimensional Skeleton Data from Sign Language Video. ACM Transactions on Intelligent Systems and Technology, 11(3). https://doi.org/10.1145/3377552

Learning Three-dimensional Skeleton Data from Sign Language Video

Abstract

Author supplied keywords

Cite

Register to see more suggestions