Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Semantic embeddings play a crucial role in natural language-based information retrieval. Embedding models represent words and contexts as vectors whose spatial configuration is derived from the distribution of words in large text corpora. While such representations are generally very powerful, they might fail to account for fine-grained domain-specific nuances. In this article, we investigate this uncertainty for the domain of characterizations of expressive piano performance. Using a music research dataset of free text performance characterizations and a follow-up study sorting the annotations into clusters, we derive a ground truth for a domain-specific semantic similarity structure. We test five embedding models and their similarity structure for correspondence with the ground truth. We further assess the effects of contextualizing prompts, hubness reduction, cross-modal similarity, and k-means clustering. The quality of embedding models shows great variability with respect to this task; more general models perform better than domain-adapted ones and the best model configurations reach human-level agreement.

Cite

CITATION STYLE

APA

Peter, S. D., Chowdhury, S., Cancino-Chacón, C. E., & Widmer, G. (2023). Are we describing the same sound? An analysis of word embedding spaces of expressive piano performance. In ACM International Conference Proceeding Series (pp. 58–66). Association for Computing Machinery. https://doi.org/10.1145/3632754.3632759

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free