Improving the Speech Enhancement Model with Discrete Wavelet Transform Sub-Band Features in Adaptive FullSubNet †

5Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recent advancements in speech enhancement (SE) have leveraged deep neural networks with multi-domain features to improve noise suppression. This study introduces a wavelet-enhanced adaptive FullSubNet (WA-FSN) framework that replaces traditional short-time Fourier transform (STFT)-based complex spectrograms with discrete wavelet transform (DWT) sub-band features while retaining magnitude spectrogram inputs. Evaluated on the VoiceBank-DEMAND dataset, WA-FSN with one-level DWT features achieves a PESQ score of 2.8889 (+3.6% vs. baseline A-FSN’s 2.7885) and SI-SNR of 18.55 dB (+3% vs. 18.02 dB), while two-level DWT extensions reach 2.8937 PESQ (+3.8%) and 18.83 dB SI-SNR (+4.5%). The framework maintains computational efficiency through LSTM-based fusion models, requiring only six additional convolution operations for DWT feature extraction. Quantitative analysis reveals that low-frequency sub-bands contribute most to PESQ improvements (2.8937 for the lowest three sub-bands), while high-frequency sub-bands enhance SI-SNR (18.83 dB for the highest two sub-bands). These results demonstrate that wavelet-derived features complement STFT magnitude spectra effectively, providing richer time-frequency representations for complex ideal ratio mask estimation in challenging noise conditions.

Cite

CITATION STYLE

APA

Wu, Z. T., & Hung, J. W. (2025). Improving the Speech Enhancement Model with Discrete Wavelet Transform Sub-Band Features in Adaptive FullSubNet †. Electronics (Switzerland), 14(7). https://doi.org/10.3390/electronics14071354

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free