Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks

3Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Given binaural features as input, such as interaural level difference and interaural phase difference, Deep Neural Networks (DNNs) have been recently used to localize sound sources in a mixture of speech signals and/or noise, and to create time-frequency masks for the estimation of the sound sources in reverberant rooms. Here, we explore a more advanced system, where feed-forward DNNs are replaced by Convolutional Neural Networks (CNNs). In addition, the adjacent frames of each time frame (occurring before and after this frame) are used to exploit contextual information, thus improving the localization and separation for each source. The quality of the separation results is evaluated in terms of Signal to Distortion Ratio (SDR).

Cite

CITATION STYLE

APA

Zermini, A., Kong, Q., Xu, Y., Plumbley, M. D., & Wang, W. (2018). Improving reverberant speech separation with binaural cues using temporal context and convolutional neural networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10891 LNCS, pp. 361–371). Springer Verlag. https://doi.org/10.1007/978-3-319-93764-9_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free