On Comparison of Phonetic Representations for Czech Neural Speech Synthesis

Jindřich Matoušek; Daniel Tihelka

Conference Proceedings

On Comparison of Phonetic Representations for Czech Neural Speech Synthesis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13502 LNAI 410-422

DOI: 10.1007/978-3-031-16270-1_34

1Citations

1Readers

Get full text

Abstract

In this paper, we investigate two research questions related to the phonetic representation of input text in Czech neural speech synthesis: 1) whether we can afford to reduce the phonetic alphabet, and 2) whether we can remove pauses from phonetic transcription and let the speech synthesis model predict the pause positions itself. In our experiments, three different modern speech synthesis models (FastSpeech 2 + Multi-band MelGAN, Glow-TTS + UnivNet, and VITS) were employed. We have found that the reduced phonetic alphabet outperforms the traditionally used full phonetic alphabet. On the other hand, removing pauses does not help. The presence of pauses (predicted by an external pause prediction tool) in phonetic transcription leads to a slightly better quality of synthetic speech.

Author supplied keywords

Cite

CITATION STYLE

APA

Matoušek, J., & Tihelka, D. (2022). On Comparison of Phonetic Representations for Czech Neural Speech Synthesis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13502 LNAI, pp. 410–422). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-16270-1_34

On Comparison of Phonetic Representations for Czech Neural Speech Synthesis

Abstract

Author supplied keywords

Cite

Register to see more suggestions