Abstract
WaveNet is a recently-developed deep neural network for generating high-quality synthetic speech. It produces directly raw audio samples. This paper describes the first application of WaveNet-based speech synthesis for the Czech language. We used the basic WaveNet architecture. The duration of particular phones and the required fundamental frequency used for local conditioning were estimated by additional LSTM networks. We conducted a MUSHRA listening test to compare WaveNet with 2 traditional synthesis methods: unit selection and HMM-based synthesis. Experiments were performed on 4 large speech corpora. Though our implementation of WaveNet did not outperform the unit selection method as reported in other studies, there is still a lot of scope for improvement, while the unit selection TTS have probably reached its quality limit.
Author supplied keywords
Cite
CITATION STYLE
Hanzlíček, Z., Vít, J., & Tihelka, D. (2018). Wavenet-based speech synthesis applied to Czech: A comparison with the traditional synthesis methods. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 445–452). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_48
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.