Predicting F0 and voicing from NAM-captured whispered speech

Viet Anh Tran; Gérard Bailly; Hélène Loevenbruck; Tomoki Toda

Conference Proceedings

Predicting F0 and voicing from NAM-captured whispered speech

Proceedings of the 4th International Conference on Speech Prosody, SP 2008 (2008) 107-110

DOI: 10.21437/speechprosody.2008-25

6Citations

11Readers

Get full text

Abstract

The NAM-to-speech conversion proposed by Toda and colleagues which converts Non-Audible Murmur (NAM) to audible speech by statistical mapping trained using aligned corpora is a very promising technique, but its performance is still insufficient, mainly due to the difficulty in estimating F0 of the transformed voice from unvoiced speech. In this paper, we propose a method to improve F0 estimation and voicing decision in a NAM-to-speech conversion system based on Gaussian Mixture Models (GMM) applied to whispered speech. Instead of combining voicing decision and F0 estimation in a single GMM, a simple feed-forward neural network is used to detect voiced segments in the whisper while a GMM estimates a continuous melodic contour based on training voiced segments. The error rate for the voiced/unvoiced decision of the network is 6.8% compared to 9.2% with the original system. Our proposal benefits also to F0 estimation error.

Author supplied keywords

Cite

CITATION STYLE

APA

Tran, V. A., Bailly, G., Loevenbruck, H., & Toda, T. (2008). Predicting F0 and voicing from NAM-captured whispered speech. In Proceedings of the 4th International Conference on Speech Prosody, SP 2008 (pp. 107–110). International Speech Communications Association. https://doi.org/10.21437/speechprosody.2008-25

Predicting F0 and voicing from NAM-captured whispered speech

Abstract

Author supplied keywords

Cite

Register to see more suggestions