Wavenet-based speech synthesis applied to Czech: A comparison with the traditional synthesis methods

Zdeněk Hanzlíček; Jakub Vít; Daniel Tihelka

Conference Proceedings

Wavenet-based speech synthesis applied to Czech: A comparison with the traditional synthesis methods

Lecture Notes in Computer Science (2018) 11107 LNAI 445-452

DOI: 10.1007/978-3-030-00794-2_48

6Citations

1Readers

Get full text

Abstract

WaveNet is a recently-developed deep neural network for generating high-quality synthetic speech. It produces directly raw audio samples. This paper describes the first application of WaveNet-based speech synthesis for the Czech language. We used the basic WaveNet architecture. The duration of particular phones and the required fundamental frequency used for local conditioning were estimated by additional LSTM networks. We conducted a MUSHRA listening test to compare WaveNet with 2 traditional synthesis methods: unit selection and HMM-based synthesis. Experiments were performed on 4 large speech corpora. Though our implementation of WaveNet did not outperform the unit selection method as reported in other studies, there is still a lot of scope for improvement, while the unit selection TTS have probably reached its quality limit.

Author supplied keywords

Cite

CITATION STYLE

APA

Hanzlíček, Z., Vít, J., & Tihelka, D. (2018). Wavenet-based speech synthesis applied to Czech: A comparison with the traditional synthesis methods. In Lecture Notes in Computer Science (Vol. 11107 LNAI, pp. 445–452). Springer Verlag. https://doi.org/10.1007/978-3-030-00794-2_48

Wavenet-based speech synthesis applied to Czech: A comparison with the traditional synthesis methods

Abstract

Author supplied keywords

Cite

Register to see more suggestions