A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion

10Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

At present, vision-based hand gesture recognition is very important in human-robot interaction (HRI). This non-contact method enables natural and friendly interaction between people and robots. Aiming at this technology, a two-stream CNN framework (2S-CNN) is proposed to recognize the American sign language (ASL) hand gestures based on multimodal (RGB and depth) data fusion. Firstly, the hand gesture data is enhanced to remove the influence of background and noise. Secondly, hand gesture RGB and depth features are extracted for hand gesture recognition using CNNs on two streams, respectively. Finally, a fusion layer is designed for fusing the recognition results of the two streams. This method utilizes multimodal data to increase the recognition accuracy of the ASL hand gestures. The experiments prove that the recognition accuracy of 2S-CNN can reach 92.08 $$\%$$ on ASL fingerspelling database and is higher than that of baseline methods.

Cite

CITATION STYLE

APA

Gao, Q., Ogenyi, U. E., Liu, J., Ju, Z., & Liu, H. (2020). A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion. In Advances in Intelligent Systems and Computing (Vol. 1043, pp. 107–118). Springer Verlag. https://doi.org/10.1007/978-3-030-29933-0_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free