Cover song identification is an important problem in the field of Music Information Retrieval. Most existing methods rely on hand-crafted features and sequence alignment methods, and further breakthrough is hard to achieve. In this paper, Convolutional Neural Networks (CNNs) are used for representation learning toward this task. We show that they could be naturally adapted to deal with key transposition in cover songs. Additionally, Temporal Pyramid Pooling is utilized to extract information on different scales and transform songs with different lengths into fixed-dimensional representations. Furthermore, a training scheme is designed to enhance the robustness of our model. Extensive experiments demonstrate that combined with these techniques, our approach is robust against musical variations existing in cover songs and outperforms state-of-the-art methods on several datasets with low time complexity.
CITATION STYLE
Yu, Z., Xu, X., Chen, X., & Yang, D. (2019). Temporal pyramid pooling convolutional neural network for cover song identification. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2019-August, pp. 4846–4852). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/673
Mendeley helps you to discover research relevant for your work.