The naturalness of synthetic speech depends on automatic extraction of prosodic features and prosody modeling. To improve the naturalness of the synthesized speech, we want to apply the concept of Analysis-by-Synthesis of prosodic information. Therefore, the accents and phrases of the speech signal were extracted using the quantitative Fujisaki model in a recognition model. In a generative model we resynthesized the speech signal using a cepstrum vocoder. The excitation signal of the vocoder are the pitch marks (PM), which were calculated from multiple levels of the accent and phrase marking algorithm. A preference test was performed to confirm the performance of the proposed method. For every speech signal four signals were resynthesized according to the calculated PM. Evaluators compared the resynthesized signals with one another. Results show that the quality of the resynthesized signal after prosodic marking is better.
CITATION STYLE
Hussein, H., Strecha, G., & Hoffmann, R. (2010). Resynthesis of prosodic information using the cepstrum vocoder. In Proceedings of the International Conference on Speech Prosody. International Speech Communication Association. https://doi.org/10.21437/speechprosody.2010-102
Mendeley helps you to discover research relevant for your work.