The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio

Shan Liang; Wenju Liu; Wei Jiang; Wei Xue

Journal ArticleOPEN ACCESS

The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio

Liang S
Liu W
Jiang W
et al.

The Journal of the Acoustical Society of America (2013) 134(5) EL452-EL458

DOI: 10.1121/1.4824632

31Citations

23Readers

Abstract

In this paper, a computational goal for a monaural speech separation system is proposed. Since this goal is derived by maximizing the signal-to-noise ratio (SNR), it is called the optimal ratio mask (ORM). Under the approximate W-Disjoint Orthogonality assumption which almost always holds due to the sparse nature of speech, theoretical analysis shows that the ORM can improve the SNR about 10log102 dB over the ideal ratio mask. With three kinds of real-world interference, the speech separation results of SNR gain and objective quality evaluation demonstrate the correctness of the theoretical analysis, and imply that the ORM achieves a better separation performance.

Cite

CITATION STYLE

APA

Liang, S., Liu, W., Jiang, W., & Xue, W. (2013). The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio. The Journal of the Acoustical Society of America, 134(5), EL452–EL458. https://doi.org/10.1121/1.4824632

The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio

Abstract

Cite

Register to see more suggestions