Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention. Building upon recent advances in computational optimal transport, we develop an optimal transport non-negative matrix factorization (NMF) algorithm for supervised speech blind source separation (BSS). Optimal transport allows us to design and leverage a cost between short-time Fourier transform (STFT) spectrogram frequencies, which takes into account how humans perceive sound. We give empirical evidence that using our proposed optimal transport, NMF leads to perceptually better results than NMF with other losses, for both isolated voice reconstruction and speech denoising using BSS. Finally, we demonstrate how to use optimal transport for cross-domain sound processing tasks, where frequencies represented in the input spectrograms may be different from one spectrogram to another.
CITATION STYLE
Rolet, A., Seguy, V., Blondel, M., & Sawada, H. (2018). Blind source separation with optimal transport non-negative matrix factorization. Eurasip Journal on Advances in Signal Processing, 2018(1). https://doi.org/10.1186/s13634-018-0576-2
Mendeley helps you to discover research relevant for your work.