Vocal melody extraction via HRNet-based singing voice separation and encoder-decoder-based F0 estimation

33Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

Abstract

Vocal melody extraction is an important and challenging task in music information retrieval. One main difficulty is that, most of the time, various instruments and singing voices are mixed according to harmonic structure, making it hard to identify the fundamental frequency (F0) of a singing voice. Therefore, reducing the interference of accompaniment is beneficial to pitch estimation of the singing voice. In this paper, we first adopted a high-resolution network (HRNet) to separate vocals from polyphonic music, then designed an encoder-decoder network to estimate the vocal F0 values. Experiment results demonstrate that the effectiveness of the HRNet-based singing voice separation method in reducing the interference of accompaniment on the extraction of vocal melody, and the proposed vocal melody extraction (VME) system outperforms other state-of-the-art algorithms in most cases.

Cite

CITATION STYLE

APA

Gao, Y., Zhang, X., & Li, W. (2021). Vocal melody extraction via HRNet-based singing voice separation and encoder-decoder-based F0 estimation. Electronics (Switzerland), 10(3), 1–14. https://doi.org/10.3390/electronics10030298

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free