Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement

3Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Recent advancements in deep learning-based speech enhancement models have extensively used attention mechanisms to achieve state-of-the-art methods by demonstrating their effectiveness. This paper proposes a transformer attention network based sub-convolutional U-Net (TANSCUNet) for speech enhancement. Instead of adopting conventional RNNs and temporal convolutional networks for sequence modeling, we employ a novel transformer-based attention network between the sub-convolutional U-Net encoder and decoder for better feature learning. More specifically, it is composed of several adaptive time―frequency attention modules and an adaptive hierarchical attention module, aiming to capture long-term time-frequency dependencies and further aggregate hierarchical contextual information. Additionally, a sub-convolutional encoder-decoder model used different kernel sizes to extract multi-scale local and contextual features from the noisy speech. The experimental results show that the proposed model outperforms several state-of-the-art methods.

Cite

CITATION STYLE

APA

Yecchuri, S., & Vanambathina, S. D. (2024). Sub-convolutional U-Net with transformer attention network for end-to-end single-channel speech enhancement. Eurasip Journal on Audio, Speech, and Music Processing, 2024(1). https://doi.org/10.1186/s13636-024-00331-z

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free