Exploring how a generative AI interprets music

Gabriela Barenboim; Luigi Del Debbio; Johannes Hirn; Verónica Sanz

Journal ArticleOPEN ACCESS

Exploring how a generative AI interprets music

Neural Computing and Applications (2024) 36(27) 17007-17022

DOI: 10.1007/s00521-024-09956-9

1Citations

9Readers

Abstract

We aim to investigate how closely neural networks (NNs) mimic human thinking. As a step in this direction, we study the behavior of artificial neuron(s) that fire most when the input data score high on some specific emergent concepts. In this paper, we focus on music, where the emergent concepts are those of rhythm, pitch and melody as commonly used by humans. As a black box to pry open, we focus on Google’s MusicVAE, a pre-trained NN that handles music tracks by encoding them in terms of 512 latent variables. We show that several hundreds of these latent variables are “irrelevant” in the sense that can be set to zero with minimal impact on the reconstruction accuracy. The remaining few dozens of latent variables can be sorted by order of relevance by comparing their variance. We show that the first few most relevant variables, and only those, correlate highly with dozens of human-defined measures that describe rhythm and pitch in music pieces, thereby efficiently encapsulating many of these human-understandable concepts in a few nonlinear variables.

Author supplied keywords

Cite

CITATION STYLE

APA

Barenboim, G., Debbio, L. D., Hirn, J., & Sanz, V. (2024). Exploring how a generative AI interprets music. Neural Computing and Applications, 36(27), 17007–17022. https://doi.org/10.1007/s00521-024-09956-9

Exploring how a generative AI interprets music

Abstract

Author supplied keywords

Cite

Register to see more suggestions