Cross recurrence quantification f...
Cross recurrence quantification for cover song identification This article has been downloaded from IOPscience. Please scroll down to see the full text article. 2009 New J. Phys. 11 093017 (http://iopscience.iop.org/1367-2630/11/9/093017) Download details: IP Address: 130.149.154.39 The article was downloaded on 18/04/2012 at 13:19 Please note that terms and conditions apply. View the table of contents for this issue, or go to the journal homepage for more Home Search Collections Journals About Contact us My IOPscience
T h e o p e n ��� a c c e s s j o u r n a l f o r p h y s i c s New Journal of Physics Cross recurrence quantification for cover song identification Joan Serr��1, Xavier Serra and Ralph G Andrzejak Department of Information and Communication Technologies, Universitat Pompeu Fabra, Roc Boronat 138, 08018 Barcelona, Spain E-mail: joan.serraj@upf.edu New Journal of Physics 11 (2009) 093017 (20pp) Received 22 July 2009 Published 15 September 2009 Online at http://www.njp.org/ doi:10.1088/1367-2630/11/9/093017 Abstract. There is growing evidence that nonlinear time series analysis techniques can be used to successfully characterize, classify, or process signals derived from real-world dynamics even though these are not necessarily deterministic and stationary. In the present study, we proceed in this direction by addressing an important problem our modern society is facing, the automatic classification of digital information. In particular, we address the automatic identification of cover songs, i.e. alternative renditions of a previously recorded musical piece. For this purpose, we here propose a recurrence quantification analysis measure that allows the tracking of potentially curved and disrupted traces in cross recurrence plots (CRPs). We apply this measure to CRPs constructed from the state space representation of musical descriptor time series extracted from the raw audio signal. We show that our method identifies cover songs with a higher accuracy as compared to previously published techniques. Beyond the particular application proposed here, we discuss how our approach can be useful for the characterization of a variety of signals from different scientific disciplines. We study coupled R��ssler dynamics with stochastically modulated mean frequencies as one concrete example to illustrate this point. 1 Author to whom any correspondence should be addressed. New Journal of Physics 11 (2009) 093017 1367-2630/09/093017+20$30.00 �� IOP Publishing Ltd and Deutsche Physikalische Gesellschaft
2 Contents 1. Introduction 2 2. Method 5 2.1. Pre-processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2. State space embedding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.3. CRP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.4. Recurrence quantification measures for cover song identification . . . . . . . . 7 3. Evaluation 11 3.1. Evaluation data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2. Evaluation methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 4. Results 12 4.1. Parameter optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 4.2. Out-of-sample accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.3. Comparison with the state-of-the-art . . . . . . . . . . . . . . . . . . . . . . . 14 5. Conclusion 15 6. Outlook 16 Acknowledgments 18 References 18 1. Introduction An unprecedented growth in the availability of and access to digital information is taking place in today���s society, and music is a paradigmatic example. Online digital music collections are in the order of millions of tracks, and personal collections can easily exceed the practical limits on the time to listen to them [1]. This huge amount of information readily accessible for end users poses major challenges for automatically describing, understanding, searching, retrieving, and organizing musical contents. Music information retrieval (MIR) is the interdisciplinary research field that deals with these challenges [2]. MIR systems use multiple sources of information: the raw audio signal, symbolic music representations, audio metadata, tags provided by users or experts, music and social networks data, etc. In content-based MIR, much effort is focused on extracting information from the raw audio signal to represent certain musical aspects such as timbre, melody, main tonality, chords, or tempo [1]. Usually, these features are computed in a short-time moving window either from a temporal, spectral, or cepstral representation of the audio signal [1], leading to a descriptor time series reflecting the temporal evolution of a given musical aspect. While common MIR strategies characterize these time series by means of statistical modeling or machine learning techniques [3]���[5], raw descriptor time series are used for many tasks such as audio alignment and matching [6], song structure analysis [7], music similarity [8], audio fingerprinting [9], or cover song identification [10]���[18]. A cover song is an alternative version, performance, rendition, or recording of a previously recorded musical piece. While cover songs might differ from their originals in several musical aspects such as timbre, tempo, song structure, main tonality, arrangement, lyrics, or language of the vocals, they resemble their originals with regard to other features. A robust so-called ���mid-level feature��� that is largely preserved under the mentioned musical variations is the New Journal of Physics 11 (2009) 093017 (http://www.njp.org/)