Control Emotion Intensity for LSTM-Based Expressive Speech Synthesis

Xiaolian Zhu; Liumeng Xue

Conference Proceedings

Control Emotion Intensity for LSTM-Based Expressive Speech Synthesis

Communications in Computer and Information Science (2019) 1059 645-656

DOI: 10.1007/978-981-15-0121-0_51

0Citations

3Readers

Get full text

Abstract

To improve the performance of human-computer interaction interfaces, emotion is considered to be one of the most important factors. The major objective of expressive speech synthesis is to inject various expressions reflecting different emotions to the synthesized speech. To effectively model and control the emotion, emotion intensity is introduced for expressive speech synthesis model to generate speech conveyed the delicate and complicate emotional states. The system was composed of an emotion analysis module with the goal of extracting control emotion intensity vector and a speech synthesis module responsible for mapping text characters to speech waveform. The proposed continuous variable “perception vector” is a data-driven approach of controlling the model to synthesize speech with different emotion intensities. Compared with the system using a one-hot vector to control emotion intensity, this model using perception vector is able to learn the high-level emotion information from low-level acoustic features. In terms of the model controllability and flexibility, both the objective and subjective evaluations demonstrate perception vector outperforms one-hot vector.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhu, X., & Xue, L. (2019). Control Emotion Intensity for LSTM-Based Expressive Speech Synthesis. In Communications in Computer and Information Science (Vol. 1059, pp. 645–656). Springer Verlag. https://doi.org/10.1007/978-981-15-0121-0_51

Control Emotion Intensity for LSTM-Based Expressive Speech Synthesis

Abstract

Author supplied keywords

Cite

Register to see more suggestions