Improvements to prosodic variation in long short-term memory based intonation models using random forest

Bálint Pál Tóth; Balázs Szórádi; Géza Németh

Conference Proceedings

Improvements to prosodic variation in long short-term memory based intonation models using random forest

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9811 LNCS 386-394

DOI: 10.1007/978-3-319-43958-7_46

0Citations

5Readers

Get full text

Abstract

Statistical parametric speech synthesis has overcome unit selection methods in many aspects, including flexibility and variability. However, the intonation of these systems is quite monotonic, especially in case of longer sentences. Due to statistical methods the variation of fundamental frequency (F0) trajectories decreases. In this research a random forest (RF) based classifier was trained with radio conversations based on the perceived variation by a human annotator. This classifier was used to extend the labels of a phonetically balanced, studio quality speech corpus. With the extended labels a Long Short-Term Memory (LSTM) network was trained to model fundamental frequency (F0). Objective and subjective evaluations were carried out. The results show that the variation of the generated F0 trajectories can be fine-tuned with an additional input of the LSTM network.

Author supplied keywords

Cite

CITATION STYLE

APA

Tóth, B. P., Szórádi, B., & Németh, G. (2016). Improvements to prosodic variation in long short-term memory based intonation models using random forest. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9811 LNCS, pp. 386–394). Springer Verlag. https://doi.org/10.1007/978-3-319-43958-7_46

Improvements to prosodic variation in long short-term memory based intonation models using random forest

Abstract

Author supplied keywords

Cite

Register to see more suggestions