First steps towards new Czech voice conversion system

N/ACitations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper we deal with initial experiments on creating a new Czech voice conversion system. Voice conversion (VC) is a process which modifies the speech signal produced by one (source) speaker so that it sounds like another (target) speaker. Using VC technique a new voice for speech synthesizer can be prepared with no need to record a huge amount of new speech data. The transformation is determined using equal sentences from both speakers; these sentences are time-aligned using modified dynamic time warping algorithm. The conversion is divided into two stages corresponding to the source-filter model of speech production. Within this work we employ conversion function based on Gaussian mixture model for transforming the spectral envelope described by line spectral frequencies. Residua are converted using so called residual prediction techniques. Unlike in other similar research works, we predict residua not from the transformed spectral envelope, but directly from the source speech. Four versions of residual prediction are described and compared in this study. Objective evaluation of converted speech using performance metrics shows that our system is comparable with similar existing VC systems. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Hanzlíček, Z., & Matoušek, J. (2006). First steps towards new Czech voice conversion system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4188 LNCS, pp. 383–390). Springer Verlag. https://doi.org/10.1007/11846406_48

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free