Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck

1Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

Silent speech interfaces (SSIs) enable speech recognition and synthesis in the absence of an acoustic signal. Yet, the archetypal SSI fails to convey the expressive attributes of prosody such as pitch and loudness, leading to lexical ambiguities. The aim of this study was to determine the efficacy of using surface electromyography (sEMG) as an approach for predicting continuous acoustic estimates of prosody. Ten participants performed a series of vocal tasks including sustained vowels, phrases, and monologues while acoustic data was recorded simultaneously with sEMG activity from muscles of the face and neck. A battery of time-, frequency-, and cepstral-domain features extracted from the sEMG signals were used to train deep regression neural networks to predict fundamental frequency and intensity contours from the acoustic signals. We achieved an average accuracy of 0.01 ST and precision of 0.56 ST for the estimation of fundamental frequency, and an average accuracy of 0.21 dB SPL and precision of 3.25 dB SPL for the estimation of intensity. This work highlights the importance of using sEMG as an alternative means of detecting prosody and shows promise for improving SSIs in future development.

Cite

CITATION STYLE

APA

Vojtech, J. M., Mitchell, C. L., Raiff, L., Kline, J. C., & De Luca, G. (2022). Prediction of Voice Fundamental Frequency and Intensity from Surface Electromyographic Signals of the Face and Neck. Vibration, 5(4), 692–710. https://doi.org/10.3390/vibration5040041

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free