Statistical parametric speech synthesis with a novel codebook-based excitation model

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Speech synthesis is an important modality in Cognitive Infocommunications, which is the intersection of informatics and cognitive sciences. Statistical parametric methods have gained importance in speech synthesis recently. The speech signal is decomposed to parameters and later restored from them. The decomposition is implemented by speech coders. We apply a novel codebook-based speech coding method to model the excitation of speech. In the analysis stage the speech signal is analyzed frame-by-frame and a codebook of pitch synchronous excitations is built from the voiced parts. Timing, gain and harmonic-to-noise ratio parameters are extracted and fed into the machine learning stage of Hidden Markov-model based speech synthesis. During the synthesis stage the codebook is searched for a suitable element in each voiced frame and these are concatenated to create the excitation signal, from which the final synthesized speech is created. Our initial experiments show that the model fits well in the statistical parametric speech synthesis framework and in most cases it can synthesize speech in a better quality than the traditional pulse-noise excitation. (This paper is an extended version of [10].)

Cite

CITATION STYLE

APA

Csapó, T. G., & Németh, G. (2014). Statistical parametric speech synthesis with a novel codebook-based excitation model. In Intelligent Decision Technologies (Vol. 8, pp. 289–299). IOS Press. https://doi.org/10.3233/IDT-140197

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free