U-Shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Yi Li; Yang Sun; Wenwu Wang; Syed Mohsen Naqvi

Journal Article

U-Shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

IEEE/ACM Transactions on Audio Speech and Language Processing (2023) 31 1511-1521

DOI: 10.1109/TASLP.2023.3265839

39Citations

23Readers

Get full text

Abstract

Recently, Transformer shows the potential to exploit the long-range sequence dependency in speech with self-attention. It has been introduced in single channel speech enhancement to improve the accuracy of speech estimation from a noise mixture. However, the amount of information represented across attention-heads is often huge, which leads to increased computational complexity. To address this issue, the axial attention is proposed i.e., to split a 2D attention into two 1-D attentions. In this paper, we develop a new method for speech enhancement by leveraging the axial attention, where we generate time and frequency sub-attention maps by calculating the attention map along time- and frequency-axis. Different from the conventional axial attention, the proposed method provides two parallel multi-head attentions for time- and frequency-axis, respectively. Moreover, the frequency-band aware attention is proposed i.e., high frequency-band attention (HFA), and low frequency-band attention (LFA), which facilitates the exploitation of the information related to speech and noise in different frequency bands in the noisy mixture. To re-use high-resolution feature maps from the encoder, we design a U-shaped Transformer, which helps recover lost information from the high-level representations to further improve the speech estimation accuracy. Extensive experiments on four public datasets are used to demonstrate the efficacy of the proposed method.

Author supplied keywords

Cite

CITATION STYLE

APA

Li, Y., Sun, Y., Wang, W., & Naqvi, S. M. (2023). U-Shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement. IEEE/ACM Transactions on Audio Speech and Language Processing, 31, 1511–1521. https://doi.org/10.1109/TASLP.2023.3265839

U-Shaped Transformer with Frequency-Band Aware Attention for Speech Enhancement

Abstract

Author supplied keywords

Cite

Register to see more suggestions