This paper presents a framework used to simulate four basic emotional styles of Vietnamese speech, by means of acoustic feature transplantation techniques applied to neutral utterances. First, it describes some analyses of acoustic features of Vietnamese emotional speech, accomplished to find the relations between prosodic, voice quality variations and emotional states in Vietnamese speech. Then the target pitch profiles together with duration, energy and spectrum constraints were obtained by applying rules which were inferred from the analysis results and based on the idea that when some emotional speech is synthesized from neutral speech, acoustic features are modified more in some syllables, instead of uniformly modified in all syllables. From there, neutral speech were morphed to produced synthesized speech with emotions. Results of perceptual tests show that emotional styles were well recognized.
CITATION STYLE
Ngo, T. D., Akagi, M., & Bui, T. D. (2015). Toward a rule-based synthesis of Vietnamese emotional speech. In Advances in Intelligent Systems and Computing (Vol. 326, pp. 129–142). Springer Verlag. https://doi.org/10.1007/978-3-319-11680-8_11
Mendeley helps you to discover research relevant for your work.