An Efficient Short-Time Discrete Cosine Transform and Attentive MultiResUNet Framework for Music Source Separation

9Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

The music source separation problem, where the task at hand is to estimate the audio components that are present in a mixture, has been at the centre of research activity for a long time. In more recent frameworks, the problem is tackled by creating deep learning models, which attempt to extract information from each component by using Short-Time Fourier Transform (STFT) spectrograms as input. Most approaches assume that one source is present at each time-frequency point, which allows to allocate this point from the mixture to the desired source. Since this assumption is strong and is reported not to hold in practice, there is a problem that arises from the use of the magnitude of the STFT as input to these networks, which is the absence of the Fourier phase information during the separated source reconstruction. The recovery of the Fourier phase information is neither easily tractable, nor computationally efficient to estimate. In this paper, we propose a novel Attentive MultiResUNet architecture, that uses real-valued Short-Time Discrete Cosine Transform data as inputs. This step avoids the phase recovery problem, by estimating the appropriate values within the network itself, rather than employing complex estimation or post-processing algorithms. The proposed novel network features a U-Net type structure with residual skip connections and an attention mechanism that correlates the skip connection and the decoder output at the previous level. The proposed network is used for the first time in source separation and is more computationally efficient than state-of-the-art separation networks and features favourable performance compared to the state-of-the-art with a fraction of the computational cost.

Cite

CITATION STYLE

APA

Sgouros, T., Bousis, A., & Mitianoudis, N. (2022). An Efficient Short-Time Discrete Cosine Transform and Attentive MultiResUNet Framework for Music Source Separation. IEEE Access, 10, 119448–119459. https://doi.org/10.1109/ACCESS.2022.3221766

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free