Intensity modeling for syllable based text-to-speech synthesis

2Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The quality of text-to-speech (TTS) synthesis systems can be improved by controlling the intensities of speech segments in addition to durations and intonation. This paper proposes linguistic and production constraints for modeling the intensity patterns of sequence of syllables. Linguistic constraints are represented by positional, contextual and phonological features, and production constraints are represented by articulatory features associated to syllables. In this work, feedforward neural network (FFNN) is proposed to model the intensities of syllables. The proposed FFNN model is evaluated by means of objective measures such as average prediction error (μ), standard deviation (σ), correlation coefficient (γ X,Y ) and the percentage of syllables predicted within different deviations. The prediction performance of the proposed model is compared with other statistical models such as Linear Regression (LR) and Classification and Regression Tree (CART) models. The models are also evaluated by means of subjective listening tests on the synthesized speech generated by incorporating the predicted syllable intensities in Bengali TTS system. From the evaluation studies, it is observed that prediction accuracy is better for FFNN models, compared to other models. © 2012 Springer-Verlag.

Cite

CITATION STYLE

APA

Ramu Reddy, V., & Sreenivasa Rao, K. (2012). Intensity modeling for syllable based text-to-speech synthesis. In Communications in Computer and Information Science (Vol. 306 CCIS, pp. 106–117). https://doi.org/10.1007/978-3-642-32129-0_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free