Multi-task WaveRNN with an integrated architecture for cross-lingual voice conversion

Yi Zhou; Xiaohai Tian; Haizhou Li

Journal ArticleOPEN ACCESS

Multi-task WaveRNN with an integrated architecture for cross-lingual voice conversion

IEEE Signal Processing Letters (2020) 27 1310-1314

DOI: 10.1109/LSP.2020.3010163

10Citations

19Readers

Abstract

Spoken languages are similar phonetically because humans have a common vocal production system. However, each language has a unique phonetic repertoire and phonotactic rule. In cross-lingual voice conversion, source speaker and target speaker speak different languages. The challenge is how to project the speaker identity of the source speaker to that of the target across two different phonetic systems. A typical voice conversion system employs a generator-vocoder pipeline, where the generator is responsible for conversion, and the vocoder is for waveform reconstruction. We propose a novel Multi-Task WaveRNN with an integrated architecture for cross-lingual voice conversion. The WaveRNN is trained on two sets of monolingual data via a two-task learning. The integrated architecture takes linguistic features as input and outputs speech waveform directly. Voice conversion experiments are conducted between English and Mandarin, which confirm the effectiveness of the proposed method in terms of speech quality and speaker similarity.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhou, Y., Tian, X., & Li, H. (2020). Multi-task WaveRNN with an integrated architecture for cross-lingual voice conversion. IEEE Signal Processing Letters, 27, 1310–1314. https://doi.org/10.1109/LSP.2020.3010163

Multi-task WaveRNN with an integrated architecture for cross-lingual voice conversion

Abstract

Author supplied keywords

Cite

Register to see more suggestions