In this work, we study the use of convolutional neural networks for genre recognition in symbolically represented music. Specifically, we explore the effects of changing network depth, width and kernel sizes while keeping the number of trainable parameters and each block’s receptive field constant. We propose an architecture for handling MIDI data that makes use of multiple resolutions of the input, called Multiple Sequence Resolution Network (MuSeReNet). These networks accept multiple inputs, each at half the original sequence length, representing information at a lower resolution. Through our experiments, we outperform the state-of-the-art for MIDI genre recognition on the topMAGD and MASD datasets. Finally, we adapt various post hoc explainability methods to the domain of symbolic music and attempt to explain the predictions of our best performing network.
CITATION STYLE
Dervakos, E., Kotsani, N., & Stamou, G. (2023). Genre Recognition from Symbolic Music with CNNs: Performance and Explainability. SN Computer Science, 4(2). https://doi.org/10.1007/s42979-022-01490-6
Mendeley helps you to discover research relevant for your work.