Revisiting Denoising Diffusion Probabilistic Models for Speech Enhancement: Condition Collapse, Efficiency and Refinement

Wenxin Tai; Fan Zhou; Goce Trajcevski; Ting Zhong

Conference ProceedingsOPEN ACCESS

Revisiting Denoising Diffusion Probabilistic Models for Speech Enhancement: Condition Collapse, Efficiency and Refinement

Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (2023) 37 13627-13635

DOI: 10.1609/aaai.v37i11.26597

29Citations

14Readers

Abstract

Recent literature has shown that denoising diffusion probabilistic models (DDPMs) can be used to synthesize high-fidelity samples with a competitive (or sometimes better) quality than previous state-of-the-art approaches. However, few attempts have been made to apply DDPM for the speech enhancement task. The reported performance of the existing works is relatively poor and significantly inferior to other generative methods. In this work, we first reveal the difficulties in applying existing diffusion models to the field of speech enhancement. Then we introduce DR-DiffuSE, a simple and effective framework for speech enhancement using conditional diffusion models. We present three strategies (two in diffusion training and one in reverse sampling) to tackle the condition collapse and guarantee the sufficient use of condition information. For efficiency, we introduce the fast sampling technique to reduce the sampling process into several steps and exploit a refinement network to calibrate the defective speech. Our proposed method achieves state-of-the-art performance to the GAN-based model and shows a significant improvement over existing DDPM-based algorithms.

Cite

CITATION STYLE

APA

Tai, W., Zhou, F., Trajcevski, G., & Zhong, T. (2023). Revisiting Denoising Diffusion Probabilistic Models for Speech Enhancement: Condition Collapse, Efficiency and Refinement. In Proceedings of the 37th AAAI Conference on Artificial Intelligence, AAAI 2023 (Vol. 37, pp. 13627–13635). AAAI Press. https://doi.org/10.1609/aaai.v37i11.26597

Revisiting Denoising Diffusion Probabilistic Models for Speech Enhancement: Condition Collapse, Efficiency and Refinement

Abstract

Cite

Register to see more suggestions