Fundamental frequency contour reshaping in HMM-based speech synthesis and realization of prosodic focus using generation process model

8Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Frame-by-frame representation is not appropriate for prosodic features, which are tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. Our formerlydeveloped method, which modify generated F0 contours in the framework of the generation process model, is improved to allow plural phrase components in a breath group. Since the model can clearly relate its commands with linguistic (and para-/non-linguistic) information, the method further enables flexible controls of prosody through manipulating model commands. Prosodic focus is realized in HMM-based speech synthesis as a supplemental process; viewing the differences of command magnitudes/amplitudes between utterances without and with focus. Validity of the method was confirmed by listening experiments of synthetic speech.

Cite

CITATION STYLE

APA

Hirose, K., Hashimoto, H., Ikeshima, J., & Minematsu, N. (2012). Fundamental frequency contour reshaping in HMM-based speech synthesis and realization of prosodic focus using generation process model. In Proceedings of the 6th International Conference on Speech Prosody, SP 2012 (Vol. 1, pp. 171–174). Tongji University Press. https://doi.org/10.21437/speechprosody.2012-46

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free