Neural speech synthesis with style intensity interpolation: A perceptual analysis

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

State of the art in speech synthesis considerably reduced the gap between synthetic and human speech on the perception level. However the impact of a speech style control on the perception is not well known. In this paper, we propose a method to analyze the impact of controlling the TTS system parameters on the perception of the generated sentence. This is done through a visualization and analysis of listening test results. For this, we train a speech synthesis system with different discrete categories of speech styles. Each style is encoded using a one-hot representation in the network. After training, we interpolate between the vectors representing each style. A perception test showed that despite being trained with only discrete categories of data, the network is capable of generating intermediate intensity levels between neutral and a given speech style.

Cite

CITATION STYLE

APA

Tits, N., El Haddad, K., & Dutoit, T. (2020). Neural speech synthesis with style intensity interpolation: A perceptual analysis. In ACM/IEEE International Conference on Human-Robot Interaction (pp. 485–487). IEEE Computer Society. https://doi.org/10.1145/3371382.3378297

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free