Abstract
Recent advancements in speech enhancement (SE) have leveraged deep neural networks with multi-domain features to improve noise suppression. This study introduces a wavelet-enhanced adaptive FullSubNet (WA-FSN) framework that replaces traditional short-time Fourier transform (STFT)-based complex spectrograms with discrete wavelet transform (DWT) sub-band features while retaining magnitude spectrogram inputs. Evaluated on the VoiceBank-DEMAND dataset, WA-FSN with one-level DWT features achieves a PESQ score of 2.8889 (+3.6% vs. baseline A-FSN’s 2.7885) and SI-SNR of 18.55 dB (+3% vs. 18.02 dB), while two-level DWT extensions reach 2.8937 PESQ (+3.8%) and 18.83 dB SI-SNR (+4.5%). The framework maintains computational efficiency through LSTM-based fusion models, requiring only six additional convolution operations for DWT feature extraction. Quantitative analysis reveals that low-frequency sub-bands contribute most to PESQ improvements (2.8937 for the lowest three sub-bands), while high-frequency sub-bands enhance SI-SNR (18.83 dB for the highest two sub-bands). These results demonstrate that wavelet-derived features complement STFT magnitude spectra effectively, providing richer time-frequency representations for complex ideal ratio mask estimation in challenging noise conditions.
Author supplied keywords
Cite
CITATION STYLE
Wu, Z. T., & Hung, J. W. (2025). Improving the Speech Enhancement Model with Discrete Wavelet Transform Sub-Band Features in Adaptive FullSubNet †. Electronics (Switzerland), 14(7). https://doi.org/10.3390/electronics14071354
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.