Noise Modeling to Build Training Sets for Robust Speech Enhancement

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

DNN-based Speech Enhancement (SE) models suffer from significant performance degradation in real recordings due to the mismatch between the synthetic datasets employed for training and real test sets. To solve this problem, we propose a new Generative Adversarial Network framework for Noise Modeling (NM-GAN) that creates realistic paired training sets by imitating real noise distribution. The proposed framework combines a novel 7-layer U-Net with two bidirectional long short-term memory (LSTM) layers that act as a generator to construct complex noise. NM-GAN generates enough recall (diversity) and precision (noise quality) in its samples through adversarial and alternate training, effectively simulating real noise, which is then utilized to compose realistic paired training sets. Extensive experiments employing various qualitative and quantitative evaluation metrics verify the effectiveness of the generated noise samples and training sets, demonstrating our framework’s capabilities.

Cite

CITATION STYLE

APA

Wang, Y., Zhang, W., Wu, Z., Kong, X., Wang, Y., & Zhang, H. (2022). Noise Modeling to Build Training Sets for Robust Speech Enhancement. Applied Sciences (Switzerland), 12(4). https://doi.org/10.3390/app12041905

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free