Multimodal Fusion of Speech and Gesture Recognition based on Deep Learning

4Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper proposes a multimodal fusion architecture based on deep learning. The architecture consists of two forms: speech command and hand gesture. First, the speech and gesture commands input by users are recognized by CNN for speech command recognition and LSTM for hand gesture recognition respectively. Secondly, the obtained results are searched by keywords and compared by similarity degree to obtain recognition results. Finally, the two results are fused to output the final instructions. Experiments show that the proposed multi-mode fusion model is superior to the single-mode fusion model.

Cite

CITATION STYLE

APA

Qiu, X., Feng, Z., Yang, X., & Tian, J. (2020). Multimodal Fusion of Speech and Gesture Recognition based on Deep Learning. In Journal of Physics: Conference Series (Vol. 1453). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1453/1/012092

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free