Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio with a CPU

13Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or a WaveNet vocoder for waveform generation, they have some drawbacks regarding synthesis quality or inference speed. LPCNet was proposed as a real-time neural vocoder with a mobile CPU but its sampling frequency is still only 16 kHz. In this paper, we propose a Full-band LPCNet to synthesize high-fidelity 48 kHz speech waveforms with a CPU by introducing some simple but effective modifications to the conventional LPCNet. We then evaluate the synthesis quality using both normal speech and a singing voice. The results of these experiments demonstrate that the proposed Full-band LPCNet is the only neural vocoder that can synthesize high-quality 48 kHz speech waveforms while maintaining real-time capability with a CPU.

Cite

CITATION STYLE

APA

Matsubara, K., Okamoto, T., Takashima, R., Takiguchi, T., Toda, T., Shiga, Y., & Kawai, H. (2021). Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio with a CPU. IEEE Access, 9, 94923–94933. https://doi.org/10.1109/ACCESS.2021.3089565

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free