Optimal feature set and minimal training size for pronunciation adaptation in TTS

Marie Tahon; Raheel Qader; Gwènolè Lecorvè; Damien Lolive

Conference Proceedings

Optimal feature set and minimal training size for pronunciation adaptation in TTS

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9918 LNCS 108-119

DOI: 10.1007/978-3-319-45925-7_9

2Citations

2Readers

Get full text

Abstract

Text-to-Speech (TTS) systems rely on a grapheme-to-phoneme converter which is built to produce canonical, or statically stylized, pronunciations. Hence, the TTS quality drops when phoneme sequences generated by this converter are inconsistent with those labeled in the speech corpus on which the TTS system is built, or when a given expressivity is desired. To solve this problem, the present work aims at automatically adapting generated pronunciations to a given style by training a phoneme-to-phoneme conditional random field (CRF). Precisely, our work investigates (i) the choice of optimal features among acoustic, articulatory, phonological and linguistic ones, and (ii) the selection of a minimal data size to train the CRF. As a case study, adaptation to a TTS-dedicated speech corpus is performed. Cross-validation experiments show that small training corpora can be used without much degrading performance. Apart from improving TTS quality, these results bring interesting perspectives for more complex adaptation scenarios towards expressive speech synthesis.

Author supplied keywords

Cite

CITATION STYLE

APA

Tahon, M., Qader, R., Lecorvè, G., & Lolive, D. (2016). Optimal feature set and minimal training size for pronunciation adaptation in TTS. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9918 LNCS, pp. 108–119). Springer Verlag. https://doi.org/10.1007/978-3-319-45925-7_9

Optimal feature set and minimal training size for pronunciation adaptation in TTS

Abstract

Author supplied keywords

Cite

Register to see more suggestions