Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain

21Citations
Citations of this article
22Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

There is growing interest in new audio formats in the context of virtual reality (VR), and higher-order ambisonics (HOA) is preferred for VR systems to transmit recorded scenes owing to its transmission efficiency and its flexibility to work with different loudspeaker setups. However, the conversion between another well-known format, i.e., object format, and the HOA format is not fully addressed in the literature. To address this issue, blind source separation in a spherical harmonic (SH) domain can be considered as the best way to extract objects in terms of efficiency, i.e., decoding HOA signals for separation can be omitted. A few authors attempted to extract objects from encoded HOA signals directly by using multichannel non-negative matrix factorization (MNMF), but these approaches either assume only far-field sources or do not take array characteristics into account, which make these methods difficult to use for VR in practical situations where singers or speakers often perform close to microphones. Furthermore, MNMF generally requires a huge computational cost, although dimensional reduction to the SH domain is performed. In this work, we also model near-field sources by estimating the model parameters of non-negative tensor factorization (NTF) in the SH domain assuming that microphone signals can be obtained with a rigid spherical array. We propose a masking scheme to exclude noisy evanescent regions in the SH domain from the NTF cost function. Evaluations show that our method outperforms existing methods devised for the HOA format and that our masking approach is effective in improving the separation quality.

Cite

CITATION STYLE

APA

Mitsufuji, Y., Takamune, N., Koyama, S., & Saruwatari, H. (2021). Multichannel Blind Source Separation Based on Evanescent-Region-Aware Non-Negative Tensor Factorization in Spherical Harmonic Domain. IEEE/ACM Transactions on Audio Speech and Language Processing, 29, 607–617. https://doi.org/10.1109/TASLP.2020.3045528

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free