Abstract
In this paper, a computational goal for a monaural speech separation system is proposed. Since this goal is derived by maximizing the signal-to-noise ratio (SNR), it is called the optimal ratio mask (ORM). Under the approximate W-Disjoint Orthogonality assumption which almost always holds due to the sparse nature of speech, theoretical analysis shows that the ORM can improve the SNR about 10log102 dB over the ideal ratio mask. With three kinds of real-world interference, the speech separation results of SNR gain and objective quality evaluation demonstrate the correctness of the theoretical analysis, and imply that the ORM achieves a better separation performance.
Cite
CITATION STYLE
Liang, S., Liu, W., Jiang, W., & Xue, W. (2013). The optimal ratio time-frequency mask for speech separation in terms of the signal-to-noise ratio. The Journal of the Acoustical Society of America, 134(5), EL452–EL458. https://doi.org/10.1121/1.4824632
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.