Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This paper proposes a source-filter-based generative adversarial neural vocoder named SF-GAN, which achieves high-fidelity waveform generation from input acoustic features by introducing F0-based source excitation signals to a neural filter framework. The SF-GAN vocoder is composed of a source module and a resolution-wise conditional filter module and is trained based on generative adversarial strategies. The source module produces an excitation signal from the F0 information, then the resolution-wise convolutional filter module combines the excitation signal with processed acoustic features at various temporal resolutions and finally reconstructs the raw waveform. The experimental results show that our proposed SF-GAN vocoder outperforms the state-of-the-art HiFi-GAN and Fre-GAN in both analysis-synthesis (AS) and text-to-speech (TTS) tasks, and the synthesized speech quality of SF-GAN is comparable to the ground-truth audio.

Cite

CITATION STYLE

APA

Lu, Y. X., Ai, Y., & Ling, Z. H. (2023). Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis. In Communications in Computer and Information Science (Vol. 1765 CCIS, pp. 68–80). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-99-2401-1_6

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free