Prosodic breaks prediction from text is a fundamental task to obtain naturalness in text to speech applications. In this work we build a data-driven break predictor out of linguistic features like the Part of Speech (POS) tags and forward-backward word distance to punctuation marks, and to do so we use a basic Recurrent Neural Network (RNN) model to exploit the sequence dependency in decisions. In the experiments we evaluate the performance of a logistic regression model and the recurrent one. The results show that the logistic regression outperforms the baseline (CART) by a 9.5% in the F-score, and the addition of the recurrent layer in the model further improves the predictions of the baseline by an 11%.
CITATION STYLE
Pascual, S., & Bonafonte, A. (2016). Prosodic break prediction with RNNs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10077 LNAI, pp. 64–72). Springer Verlag. https://doi.org/10.1007/978-3-319-49169-1_7
Mendeley helps you to discover research relevant for your work.