Wavenet-based speech synthesis applied to Czech: A comparison with the traditional synthesis methods

6Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

WaveNet is a recently-developed deep neural network for generating high-quality synthetic speech. It produces directly raw audio samples. This paper describes the first application of WaveNet-based speech synthesis for the Czech language. We used the basic WaveNet architecture. The duration of particular phones and the required fundamental frequency used for local conditioning were estimated by additional LSTM networks. We conducted a MUSHRA listening test to compare WaveNet with 2 traditional synthesis methods: unit selection and HMM-based synthesis. Experiments were performed on 4 large speech corpora. Though our implementation of WaveNet did not outperform the unit selection method as reported in other studies, there is still a lot of scope for improvement, while the unit selection TTS have probably reached its quality limit.

Cite

CITATION STYLE

APA

Hanzlíček, Z., Vít, J., & Tihelka, D. (2018). Wavenet-based speech synthesis applied to Czech: A comparison with the traditional synthesis methods. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 445–452). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_48

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free