A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion

Qing Gao; Uchenna Emeoha Ogenyi; Jinguo Liu; Zhaojie Ju; Honghai Liu

Conference Proceedings

A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion

Advances in Intelligent Systems and Computing (2020) 1043 107-118

DOI: 10.1007/978-3-030-29933-0_9

10Citations

24Readers

Get full text

Abstract

At present, vision-based hand gesture recognition is very important in human-robot interaction (HRI). This non-contact method enables natural and friendly interaction between people and robots. Aiming at this technology, a two-stream CNN framework (2S-CNN) is proposed to recognize the American sign language (ASL) hand gestures based on multimodal (RGB and depth) data fusion. Firstly, the hand gesture data is enhanced to remove the influence of background and noise. Secondly, hand gesture RGB and depth features are extracted for hand gesture recognition using CNNs on two streams, respectively. Finally, a fusion layer is designed for fusing the recognition results of the two streams. This method utilizes multimodal data to increase the recognition accuracy of the ASL hand gestures. The experiments prove that the recognition accuracy of 2S-CNN can reach 92.08 $$\%$$ on ASL fingerspelling database and is higher than that of baseline methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, Q., Ogenyi, U. E., Liu, J., Ju, Z., & Liu, H. (2020). A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion. In Advances in Intelligent Systems and Computing (Vol. 1043, pp. 107–118). Springer Verlag. https://doi.org/10.1007/978-3-030-29933-0_9

A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion

Abstract

Author supplied keywords

Cite

Register to see more suggestions