CST: Complex Sparse Transformer for Low-SNR Speech Enhancement

3Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

Speech enhancement tasks for audio with a low SNR are challenging. Existing speech enhancement methods are mainly designed for high SNR audio, and they usually use RNNs to model audio sequence features, which causes the model to be unable to learn long-distance dependencies, thus limiting its performance in low-SNR speech enhancement tasks. We design a complex transformer module with sparse attention to overcome this problem. Different from the traditional transformer model, this model is extended to effectively model complex domain sequences, using the sparse attention mask balance model’s attention to long-distance and nearby relations, introducing the pre-layer positional embedding module to enhance the model’s perception of position information, adding the channel attention module to enable the model to dynamically adjust the weight distribution between channels according to the input audio. The experimental results show that, in the low-SNR speech enhancement tests, our models have noticeable performance improvements in speech quality and intelligibility, respectively.

Cite

CITATION STYLE

APA

Tan, K., Mao, W., Guo, X., Lu, H., Zhang, C., Cao, Z., & Wang, X. (2023). CST: Complex Sparse Transformer for Low-SNR Speech Enhancement. Sensors, 23(5). https://doi.org/10.3390/s23052376

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free