Abstract
State of the art in speech synthesis considerably reduced the gap between synthetic and human speech on the perception level. However the impact of a speech style control on the perception is not well known. In this paper, we propose a method to analyze the impact of controlling the TTS system parameters on the perception of the generated sentence. This is done through a visualization and analysis of listening test results. For this, we train a speech synthesis system with different discrete categories of speech styles. Each style is encoded using a one-hot representation in the network. After training, we interpolate between the vectors representing each style. A perception test showed that despite being trained with only discrete categories of data, the network is capable of generating intermediate intensity levels between neutral and a given speech style.
Author supplied keywords
Cite
CITATION STYLE
Tits, N., El Haddad, K., & Dutoit, T. (2020). Neural speech synthesis with style intensity interpolation: A perceptual analysis. In ACM/IEEE International Conference on Human-Robot Interaction (pp. 485–487). IEEE Computer Society. https://doi.org/10.1145/3371382.3378297
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.