The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio

  • Liang S
  • Liu W
  • Jiang W
  • et al.
31Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper, a computational goal for a monaural speech separation system is proposed. Since this goal is derived by maximizing the signal-to-noise ratio (SNR), it is called the optimal ratio mask (ORM). Under the approximate W-Disjoint Orthogonality assumption which almost always holds due to the sparse nature of speech, theoretical analysis shows that the ORM can improve the SNR about 10log102 dB over the ideal ratio mask. With three kinds of real-world interference, the speech separation results of SNR gain and objective quality evaluation demonstrate the correctness of the theoretical analysis, and imply that the ORM achieves a better separation performance.

Cite

CITATION STYLE

APA

Liang, S., Liu, W., Jiang, W., & Xue, W. (2013). The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio. The Journal of the Acoustical Society of America, 134(5), EL452–EL458. https://doi.org/10.1121/1.4824632

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free