Noise perturbation improves supervised speech separation

9Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Speech separation can be treated as a mask estimation problem where interference-dominant portions are masked in a time-frequency representation of noisy speech. In supervised speech separation, a classifier is typically trained on a mixture set of speech and noise. Improving the generalization of a classifier is challenging, especially when interfering noise is strong and nonstationary. Expansion of a noise through proper perturbation during training exposes the classifier to more noise variations, and hence may improve separation performance. In this study, we examine the effects of three noise perturbations at low signal-to-noise ratios (SNRs). We evaluate speech separation performance in terms of hit minus false-alarm rate and short-time objective intelligibility (STOI). The experimental results show that frequency perturbation performs the best among the three perturbations. In particular, we find that frequency perturbation reduces the error of misclassifying a noise pattern as a speech pattern.

Cite

CITATION STYLE

APA

Chen, J., Wang, Y., & Wang, D. L. (2015). Noise perturbation improves supervised speech separation. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9237, pp. 83–90). Springer Verlag. https://doi.org/10.1007/978-3-319-22482-4_10

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free