CST: Complex Sparse Transformer for Low-SNR Speech Enhancement

Kaijun Tan; Wenyu Mao; Xiaozhou Guo; Huaxiang Lu; Chi Zhang; Zhanzhong Cao; Xingang Wang

Journal ArticleOPEN ACCESS

CST: Complex Sparse Transformer for Low-SNR Speech Enhancement

Sensors (2023) 23(5)

DOI: 10.3390/s23052376

3Citations

9Readers

Abstract

Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model’s attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model’s perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively.

Author supplied keywords

Cite

CITATION STYLE

APA

Tan, K., Mao, W., Guo, X., Lu, H., Zhang, C., Cao, Z., & Wang, X. (2023). CST: Complex Sparse Transformer for Low-SNR Speech Enhancement. Sensors, 23(5). https://doi.org/10.3390/s23052376

CST: Complex Sparse Transformer for Low-SNR Speech Enhancement

Abstract

Author supplied keywords

Cite

Register to see more suggestions