Multimodal Fusion of Speech and Gesture Recognition based on Deep Learning

Xiaoyu Qiu; Zhiquan Feng; Xiaohui Yang; Jinglan Tian

Conference ProceedingsOPEN ACCESS

Multimodal Fusion of Speech and Gesture Recognition based on Deep Learning

Journal of Physics: Conference Series (2020) 1453(1)

DOI: 10.1088/1742-6596/1453/1/012092

4Citations

8Readers

Abstract

This paper proposes a multimodal fusion architecture based on deep learning. The architecture consists of two forms: speech command and hand gesture. First, the speech and gesture commands input by users are recognized by CNN for speech command recognition and LSTM for hand gesture recognition respectively. Secondly, the obtained results are searched by keywords and compared by similarity degree to obtain recognition results. Finally, the two results are fused to output the final instructions. Experiments show that the proposed multi-mode fusion model is superior to the single-mode fusion model.

References Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Qiu, X., Feng, Z., Yang, X., & Tian, J. (2020). Multimodal Fusion of Speech and Gesture Recognition based on Deep Learning. In Journal of Physics: Conference Series (Vol. 1453). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1453/1/012092

Readers' Seniority

PhD / Post grad / Masters / Doc 3

75%

Researcher 1

25%

Readers' Discipline

Computer Science 3

75%

Engineering 1

25%

Multimodal Fusion of Speech and Gesture Recognition based on Deep Learning

Abstract

References Powered by Scopus

Deep learning algorithms for human activity recognition using mobile and wearable sensor networks: State of the art and research challenges

Gesture recognition for human-robot collaboration: A review

Comparison of feature learning methods for human activity recognition using wearable sensors

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline