Dual-input Control Interface for Deep Neural Network Based on Image/Speech Recognition

Neng Sheng Pai; Yi Hsun Chen; Chin Pao Hung; Pi Yun Chen; Ying Che Kuo; Jun Yu Chen

Journal ArticleOPEN ACCESS

Dual-input Control Interface for Deep Neural Network Based on Image/Speech Recognition

Sensors and Materials (2019) 31(11) 3451-3463

DOI: 10.18494/SAM.2019.2481

5Citations

5Readers

Abstract

The objective of this study was to design a control interface for dual-input video/audio recognition consisting of two input interface systems, hand posture and speech recognition, with the use of specific hand postures or voice commands for control without the need for wearable devices. Original video camera images were preprocessed for hand posture recognition, and the face in the image was used as the reference point and identified using the Adaboost classifier. An image of a specific size was selected as the recognition input image to increase the recognition speed. A neural network comprising convolutional, activation, max pooling, and fully connected layers was used to classify and recognize hand posture images as well as speech. Long short-term memory (LSTM) in a recurrent neural network (RNN) was used to achieve speech recognition. Speech features were extracted by preprocessing, and Mel-frequency cepstral coefficients (MFCCs) and a fast Fourier transform (FFT) were then used to convert the signals from the time domain to the frequency domain. The frequency domain signals subsequently underwent a discrete cosine transform through triangular bandpass filters to derive MFCCs as the speech eigenvalue input. The speech feature parameters were then input to the LSTM neural network to make predictions and achieve speech recognition. Experimental results showed the image/speech dual-input control interface had good sound recognition capability, supporting the findings of this study.

Author supplied keywords

Cite

CITATION STYLE

APA

Pai, N. S., Chen, Y. H., Hung, C. P., Chen, P. Y., Kuo, Y. C., & Chen, J. Y. (2019). Dual-input Control Interface for Deep Neural Network Based on Image/Speech Recognition. Sensors and Materials, 31(11), 3451–3463. https://doi.org/10.18494/SAM.2019.2481

Dual-input Control Interface for Deep Neural Network Based on Image/Speech Recognition

Abstract

Author supplied keywords

Cite

Register to see more suggestions