Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio with a CPU

Keisuke Matsubara; Takuma Okamoto; Ryoichi Takashima; Tetsuya Takiguchi; Tomoki Toda; Yoshinori Shiga; Hisashi Kawai

Journal ArticleOPEN ACCESS

Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio with a CPU

IEEE Access (2021) 9 94923-94933

DOI: 10.1109/ACCESS.2021.3089565

13Citations

15Readers

Abstract

This paper investigates a real-time neural speech synthesis system on CPUs that can synthesize high-fidelity 48 kHz speech waveforms to cover the entire frequency range audible by human beings. Although most previous studies on 48 kHz speech synthesis have used traditional source-filter vocoders or a WaveNet vocoder for waveform generation, they have some drawbacks regarding synthesis quality or inference speed. LPCNet was proposed as a real-time neural vocoder with a mobile CPU but its sampling frequency is still only 16 kHz. In this paper, we propose a Full-band LPCNet to synthesize high-fidelity 48 kHz speech waveforms with a CPU by introducing some simple but effective modifications to the conventional LPCNet. We then evaluate the synthesis quality using both normal speech and a singing voice. The results of these experiments demonstrate that the proposed Full-band LPCNet is the only neural vocoder that can synthesize high-quality 48 kHz speech waveforms while maintaining real-time capability with a CPU.

Author supplied keywords

Cite

CITATION STYLE

APA

Matsubara, K., Okamoto, T., Takashima, R., Takiguchi, T., Toda, T., Shiga, Y., & Kawai, H. (2021). Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio with a CPU. IEEE Access, 9, 94923–94933. https://doi.org/10.1109/ACCESS.2021.3089565

Full-Band LPCNet: A Real-Time Neural Vocoder for 48 kHz Audio with a CPU

Abstract

Author supplied keywords

Cite

Register to see more suggestions