Frame-by-frame representation is not appropriate for prosodic features, which are tightly related to speech units spreading a wide time span, such as words, phrases and so on. This causes an inherit problem in fundamental frequency (F0) contour generation by HMM-based speech synthesis. Our formerlydeveloped method, which modify generated F0 contours in the framework of the generation process model, is improved to allow plural phrase components in a breath group. Since the model can clearly relate its commands with linguistic (and para-/non-linguistic) information, the method further enables flexible controls of prosody through manipulating model commands. Prosodic focus is realized in HMM-based speech synthesis as a supplemental process; viewing the differences of command magnitudes/amplitudes between utterances without and with focus. Validity of the method was confirmed by listening experiments of synthetic speech.
CITATION STYLE
Hirose, K., Hashimoto, H., Ikeshima, J., & Minematsu, N. (2012). Fundamental frequency contour reshaping in HMM-based speech synthesis and realization of prosodic focus using generation process model. In Proceedings of the 6th International Conference on Speech Prosody, SP 2012 (Vol. 1, pp. 171–174). Tongji University Press. https://doi.org/10.21437/speechprosody.2012-46
Mendeley helps you to discover research relevant for your work.