At present, vision-based hand gesture recognition is very important in human-robot interaction (HRI). This non-contact method enables natural and friendly interaction between people and robots. Aiming at this technology, a two-stream CNN framework (2S-CNN) is proposed to recognize the American sign language (ASL) hand gestures based on multimodal (RGB and depth) data fusion. Firstly, the hand gesture data is enhanced to remove the influence of background and noise. Secondly, hand gesture RGB and depth features are extracted for hand gesture recognition using CNNs on two streams, respectively. Finally, a fusion layer is designed for fusing the recognition results of the two streams. This method utilizes multimodal data to increase the recognition accuracy of the ASL hand gestures. The experiments prove that the recognition accuracy of 2S-CNN can reach 92.08 $$\%$$ on ASL fingerspelling database and is higher than that of baseline methods.
CITATION STYLE
Gao, Q., Ogenyi, U. E., Liu, J., Ju, Z., & Liu, H. (2020). A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion. In Advances in Intelligent Systems and Computing (Vol. 1043, pp. 107–118). Springer Verlag. https://doi.org/10.1007/978-3-030-29933-0_9
Mendeley helps you to discover research relevant for your work.