Exploring how a generative AI interprets music

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

We aim to investigate how closely neural networks (NNs) mimic human thinking. As a step in this direction, we study the behavior of artificial neuron(s) that fire most when the input data score high on some specific emergent concepts. In this paper, we focus on music, where the emergent concepts are those of rhythm, pitch and melody as commonly used by humans. As a black box to pry open, we focus on Google’s MusicVAE, a pre-trained NN that handles music tracks by encoding them in terms of 512 latent variables. We show that several hundreds of these latent variables are “irrelevant” in the sense that can be set to zero with minimal impact on the reconstruction accuracy. The remaining few dozens of latent variables can be sorted by order of relevance by comparing their variance. We show that the first few most relevant variables, and only those, correlate highly with dozens of human-defined measures that describe rhythm and pitch in music pieces, thereby efficiently encapsulating many of these human-understandable concepts in a few nonlinear variables.

Cite

CITATION STYLE

APA

Barenboim, G., Debbio, L. D., Hirn, J., & Sanz, V. (2024). Exploring how a generative AI interprets music. Neural Computing and Applications, 36(27), 17007–17022. https://doi.org/10.1007/s00521-024-09956-9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free