Deep Latent Fusion Layers for Binaural Speech Enhancement

Tom Gajecki; Waldo Nogueira

Journal ArticleOPEN ACCESS

Deep Latent Fusion Layers for Binaural Speech Enhancement

IEEE/ACM Transactions on Audio Speech and Language Processing (2023) 31 3127-3138

DOI: 10.1109/TASLP.2023.3301223

3Citations

9Readers

Abstract

This work addresses the issue of enhancing speech in binaural hearing scenarios. Specifically, we present a method to improve binaural noise reduction by integrating latent features produced by monaural speech enhancement algorithms through the use of 'Fusion layers.' These layers perform Hadamard products between latent spaces at specific processing stages. These fusion layers draw inspiration from multi-task learning techniques, which involve sharing model weights across various models aimed at handling interconnected tasks. The layers perform element-wise dot products between tensors representing latent representations at the same processing stage, mimicking the physiological excitatory and inhibitory mechanisms of the binaural hearing system. This study initially presents a general fusion model, demonstrating its ability to better fit synthetic data compared to independent linear models, equalize activation variance between learning modules, and exploit input data redundancy to improve the training error. We then apply the concept of fusion layers to enhance speech in binaural listening conditions. The proposed method shows promise for improved noise reduction compared to other feature-sharing approaches. The study also suggests that including fusion can enhance predicted speech intelligibility and quality, but too many fused features may have a negative impact on expected speech intelligibility. Furthermore, the results suggest that fusion layers should share parameterized latent representations to effectively utilize information from each listening side, rather than using deterministic representations. Overall, this study highlights the potential of sharing information between speech enhancement modules through deep fusion layers to improve binaural speech enhancement while maintaining constant trainable parameters and improving generalization.

Author supplied keywords

Cite

CITATION STYLE

APA

Gajecki, T., & Nogueira, W. (2023). Deep Latent Fusion Layers for Binaural Speech Enhancement. IEEE/ACM Transactions on Audio Speech and Language Processing, 31, 3127–3138. https://doi.org/10.1109/TASLP.2023.3301223

Deep Latent Fusion Layers for Binaural Speech Enhancement

Abstract

Author supplied keywords

Cite

Register to see more suggestions