This article is free to access.
Various musical descriptors have been developed for Cover Song Identification (CSI). However, different descriptors are based on various assumptions, designed for representing distinct characteristics of music, and often differ in scale and noise level. Therefore, a single similarity function combined with a specific descriptor is generally not able to describe the similarity between songs comprehensively and reliably. In this paper, we propose a two-layer similarity fusion model for CSI, which combines the information carried by different descriptors and similarity functions organically and incorporates the advantages of both early fusion and late fusion. In particular, in the early fusion, the similarities obtained by the same descriptor and different similarity functions are integrated with the Similarity Network Fusion (SNF) technique. Then, in the late fusion, the learning method selected by sparse group LASSO algorithm is applied on each early fused similarity to obtain the probability that the corresponding song pair belongs to the reference/cover pair. Lastly, the final fused similarity is achieved by averaging all the obtained probabilities. Extensive experimental results on the music collection that is composed of samples provided by the SecondHandSongs (SHS) verify that the proposed scheme outperforms state-of-the-art fusion based CSI schemes in terms of identification accuracy and classification efficiency.
Chen, N., Li, M., & Xiao, H. (2017). Two-layer similarity fusion model for cover song identification. Eurasip Journal on Audio, Speech, and Music Processing, 2017(1). https://doi.org/10.1186/s13636-017-0108-2