High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Bhuwan Bhattarai; Yagya Raj Pandeya; You Jie; Arjun Kumar Lamichhane; Joonwhoan Lee

Journal ArticleOPEN ACCESS

High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Circuits, Systems, and Signal Processing (2023) 42(2) 1083-1104

DOI: 10.1007/s00034-022-02166-5

6Citations

26Readers

Abstract

Music source separation has traditionally followed the encoder-decoder paradigm (e.g., hourglass, U-Net, DeconvNet, SegNet) to isolate individual music components from mixtures. Such networks, however, result in a loss of location-sensitivity, as low-resolution representation drops the useful harmonic patterns over the temporal dimension. We overcame this problem by performing singing voice separation using a high-resolution representation learning (HRNet) system coupled with a long short-term memory (LSTM) module to retain high-resolution feature map and capture the temporal behavior of the acoustic signal. We called this joint combination of HRNet and LSTM as HR-LSTM. The predicted spectrograms produced by this system are close to ground truth and successfully separate music sources, achieving results superior to those realized by past methods. The proposed network was tested using four datasets (DSD100, MIR-1K, Korean Pansori, and Nepal Idol singing voice). Our experiments confirmed that the proposed HR-LSTM outperforms state-of-the-art networks at singing voice separation when the DSD100 dataset is used, performs comparably to alternative methods when the MIR-1K dataset is used, and separates the voice and accompaniment components well when the Pansori and NISVS datasets are used. In addition to proposing and validating our network, we also developed and shared our Nepal Idol dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Bhattarai, B., Pandeya, Y. R., Jie, Y., Lamichhane, A. K., & Lee, J. (2023). High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation. Circuits, Systems, and Signal Processing, 42(2), 1083–1104. https://doi.org/10.1007/s00034-022-02166-5

High-Resolution Representation Learning and Recurrent Neural Network for Singing Voice Separation

Abstract

Author supplied keywords

Cite

Register to see more suggestions