Speech synthesis is an important modality in Cognitive Infocommunications, which is the intersection of informatics and cognitive sciences. Statistical parametric methods have gained importance in speech synthesis recently. The speech signal is decomposed to parameters and later restored from them. The decomposition is implemented by speech coders. We apply a novel codebook-based speech coding method to model the excitation of speech. In the analysis stage the speech signal is analyzed frame-by-frame and a codebook of pitch synchronous excitations is built from the voiced parts. Timing, gain and harmonic-to-noise ratio parameters are extracted and fed into the machine learning stage of Hidden Markov-model based speech synthesis. During the synthesis stage the codebook is searched for a suitable element in each voiced frame and these are concatenated to create the excitation signal, from which the final synthesized speech is created. Our initial experiments show that the model fits well in the statistical parametric speech synthesis framework and in most cases it can synthesize speech in a better quality than the traditional pulse-noise excitation. (This paper is an extended version of [10].)
CITATION STYLE
Csapó, T. G., & Németh, G. (2014). Statistical parametric speech synthesis with a novel codebook-based excitation model. In Intelligent Decision Technologies (Vol. 8, pp. 289–299). IOS Press. https://doi.org/10.3233/IDT-140197
Mendeley helps you to discover research relevant for your work.