The Problem of Limited Inter-rater Agreement in Modelling Music Similarity

41Citations
Citations of this article
49Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

One of the central goals of Music Information Retrieval (MIR) is the quantification of similarity between or within pieces of music. These quantitative relations should mirror the human perception of music similarity, which is however highly subjective with low inter-rater agreement. Unfortunately this principal problem has been given little attention in MIR so far. Since it is not meaningful to have computational models that go beyond the level of human agreement, these levels of inter-rater agreement present a natural upper bound for any algorithmic approach. We will illustrate this fundamental problem in the evaluation of MIR systems using results from two typical application scenarios: (i) modelling of music similarity between pieces of music; (ii) music structure analysis within pieces of music. For both applications, we derive upper bounds of performance which are due to the limited inter-rater agreement. We compare these upper bounds to the performance of state-of-the-art MIR systems and show how the upper bounds prevent further progress in developing better MIR systems.

Cite

CITATION STYLE

APA

Flexer, A., & Grill, T. (2016). The Problem of Limited Inter-rater Agreement in Modelling Music Similarity. Journal of New Music Research, 45(3), 239–251. https://doi.org/10.1080/09298215.2016.1200631

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free