The paper is devoted to improving the methods of voice conversion (VC) for developing text-to-speech synthesis systems with capabilities of tuning on the target speaker. Such system with VC module in acoustic processor, parametric representation of speech database for concatenative synthesis based on instantaneous harmonic representation is presented in the paper. Voice conversion is based on multiple regression mapping function and Gaussian mixture model (GMM), the method of text-independent learning is based on hidden Markov models and modified Viterbi algorithm. Experimental evaluation of the proposed solutions in terms of naturalness and similarity is presented as well.
CITATION STYLE
Zahariev, V., Azarov, E., & Petrovsky, A. (2017). Voice conversion for TTS systems with tuning on the target speaker based on GMM. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10458 LNAI, pp. 788–798). Springer Verlag. https://doi.org/10.1007/978-3-319-66429-3_79
Mendeley helps you to discover research relevant for your work.