Dual-Path Hybrid Attention Network for Monaural Speech Separation

7Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Recent advances in the time-domain speech separation methods, particularly those specialized in using attention mechanisms to model sequences, have significantly improved speech separation performance. In this paper, we address monaural (one microphone) speaker separation, mainly in the case of two concurrent speakers. We propose a dual-path hybrid attention network (DPHA-Net) for monaural speech separation based on time-domain. The critical component of DPHA-Net, the DPHA module, comprises multiple attentions and is designed to capture the short and long-term context information dependencies. DPHA module consists of the multi-head self-attention (MHSA), element-wise attention (EA), and adaptive feature fusion (AFF) units. We proposed an improved multi-stage aggregation training strategy during the training. That strategy has proven very effective for audio separation in this paper. The results of experiments on the benchmark WSJ0-2mix, WHAM! and Libri2Mix datasets show that our proposed DPHA-Net can achieve the competitive performance. For the task of two speaker separation on the WSJ0-2mix dataset, our proposed DPHA-Net is superior to the state of the art with a margin of 0.3 dB absolute improvement on the SI-SNRi and a margin of 0.4 dB absolute improvement on the SDRi in the same condition.

Cite

CITATION STYLE

APA

Qiu, W., & Hu, Y. (2022). Dual-Path Hybrid Attention Network for Monaural Speech Separation. IEEE Access, 10, 78754–78763. https://doi.org/10.1109/ACCESS.2022.3193245

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free