Neural speech synthesis with style intensity interpolation: A perceptual analysis

Noé Tits; Kevin El Haddad; Thierry Dutoit

Conference ProceedingsOPEN ACCESS

Neural speech synthesis with style intensity interpolation: A perceptual analysis

ACM/IEEE International Conference on Human-Robot Interaction (2020) 485-487

DOI: 10.1145/3371382.3378297

3Citations

5Readers

Get full text

Abstract

State of the art in speech synthesis considerably reduced the gap between synthetic and human speech on the perception level. However the impact of a speech style control on the perception is not well known. In this paper, we propose a method to analyze the impact of controlling the TTS system parameters on the perception of the generated sentence. This is done through a visualization and analysis of listening test results. For this, we train a speech synthesis system with different discrete categories of speech styles. Each style is encoded using a one-hot representation in the network. After training, we interpolate between the vectors representing each style. A perception test showed that despite being trained with only discrete categories of data, the network is capable of generating intermediate intensity levels between neutral and a given speech style.

Author supplied keywords

Cite

CITATION STYLE

APA

Tits, N., El Haddad, K., & Dutoit, T. (2020). Neural speech synthesis with style intensity interpolation: A perceptual analysis. In ACM/IEEE International Conference on Human-Robot Interaction (pp. 485–487). IEEE Computer Society. https://doi.org/10.1145/3371382.3378297

Neural speech synthesis with style intensity interpolation: A perceptual analysis

Abstract

Author supplied keywords

Cite

Register to see more suggestions