Unit-selection speech synthesis adjustments for audiobook-based voices

Jakub Vít; Jindřich Matoušek

Book Chapter

Unit-selection speech synthesis adjustments for audiobook-based voices

Springer Verlag, (2016), 335-342

DOI: 10.1007/978-3-319-45510-5_38

1Citations

4Readers

Get full text

Abstract

This paper presents easy-to-use modifications to unit-selection speech-synthesis algorithm with voices built from audiobooks. Audiobooks are a very good source of large and high quality audio data for speech synthesis; however, they usually do not meet basic requirements for standard unit-selection synthesis: “neutral” speech properties with no expressive or spontaneous expressions, stable prosodic patterns, careful pronunciation, and consistent voice style during recording. However, if these conditions are taken into consideration, few modifications can be made to adjust the general unit-selection algorithm to make it more robust for synthesis from such audiobook data. Listening test shows that these adjustments increased perceived speech quality and acceptability against a baseline TTS system. Modifications presented here can also allow to exploit audio data variability to control pitch and tempo of synthesized speech.

Author supplied keywords

Cite

CITATION STYLE

APA

Vít, J., & Matoušek, J. (2016). Unit-selection speech synthesis adjustments for audiobook-based voices. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9924 LNCS, pp. 335–342). Springer Verlag. https://doi.org/10.1007/978-3-319-45510-5_38

Unit-selection speech synthesis adjustments for audiobook-based voices

Abstract

Author supplied keywords

Cite

Register to see more suggestions