Improved Acoustic Modeling for Automatic Piano Music Transcription Using Echo State Networks

1Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Automatic music transcription (AMT) is one of the challenging problems in Music Information Retrieval with the goal of generating a score-like representation of a polyphonic audio signal. Typically, the starting point of AMT is an acoustic model that computes note likelihoods from feature vectors. In this work, we evaluate the capabilities of Echo State Networks (ESNs) in acoustic modeling of piano music. Our experiments show that the ESN-based models outperform state-of-the-art Convolutional Neural Networks (CNNs) by an absolute improvement of 0.5 F1 -score without using an extra language model. We also discuss that a two-layer ESN, which mimics a hybrid acoustic and language model, achieves better results than the best reference approach that combines Invertible Neural Networks (INNs) with a biGRU language model by an absolute improvement of 0.91 F1 -score.

Cite

CITATION STYLE

APA

Steiner, P., Jalalvand, A., & Birkholz, P. (2021). Improved Acoustic Modeling for Automatic Piano Music Transcription Using Echo State Networks. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12862 LNCS, pp. 143–154). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-85099-9_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free