Quasi-periodic parallel wavegan: A non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network

Yi Chiao Wu; Tomoki Hayashi; Takuma Okamoto; Hisashi Kawai; Tomoki Toda

Journal ArticleOPEN ACCESS

Quasi-periodic parallel wavegan: A non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network

IEEE/ACM Transactions on Audio Speech and Language Processing (2021) 29 792-806

DOI: 10.1109/TASLP.2021.3051765

20Citations

11Readers

Abstract

In this paper, we propose a quasi-periodic parallel WaveGAN (QPPWG) waveform generative model, which applies a quasi-periodic (QP) structure to a parallel WaveGAN (PWG) model using pitch-dependent dilated convolution networks (PDCNNs). PWG is a small-footprint GAN-based raw waveform generative model, whose generation time is much faster than real time because of its compact model and non-autoregressive (non-AR) and non-causal mechanisms. Although PWG achieves high-fidelity speech generation, the generic and simple network architecture lacks pitch controllability for an unseen auxiliary fundamental frequency (F-{0}) feature such as a scaled F-{0}. To improve the pitch controllability and speech modeling capability, we apply a QP structure with PDCNNs to PWG, which introduces pitch information to the network by dynamically changing the network architecture corresponding to the auxiliary F-{0} feature. Both objective and subjective experimental results show that QPPWG outperforms PWG when the auxiliary F-{0} feature is scaled. Moreover, analyses of the intermediate outputs of QPPWG also show better tractability and interpretability of QPPWG, which respectively models spectral and excitation-like signals using the cascaded fixed and adaptive blocks of the QP structure.

Author supplied keywords

Cite

CITATION STYLE

APA

Wu, Y. C., Hayashi, T., Okamoto, T., Kawai, H., & Toda, T. (2021). Quasi-periodic parallel wavegan: A non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network. IEEE/ACM Transactions on Audio Speech and Language Processing, 29, 792–806. https://doi.org/10.1109/TASLP.2021.3051765

Quasi-periodic parallel wavegan: A non-autoregressive raw waveform generative model with pitch-dependent dilated convolution neural network

Abstract

Author supplied keywords

Cite

Register to see more suggestions