Voice conversion is a technique that transforms the source speaker individuality to that of the target speaker. We propose the simple and intuitive voice conversion algorithm not using training data between different languages and it uses text-to-speech generated speech rather than recorded real voices. The suggested method reconstructed the voice after transforming line spectral frequencies (LSF) by formant space warping functions. The formant space is the space consisted of representative four monophthongs for each language. The warping functions are represented by piecewise linear equations using pairs of four formants at matched monophthongs. In this paper, we applied LSF to voice conversion because LSF are not overly sensitive to quantization noise and can be interpolated. From experimental results, LSF based voice conversion shows good results for ABX and MOS tests than the direct frequency warping approaches.
CITATION STYLE
Yun, Y. S., Jung, J., & Eun, S. (2015). Voice conversion between synthesized bilingual voices using line spectral frequencies. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9319, pp. 463–471). Springer Verlag. https://doi.org/10.1007/978-3-319-23132-7_57
Mendeley helps you to discover research relevant for your work.