Self-supervised transfer learning from natural images for sound classification

11Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

We propose the implementation of transfer learning from natural images to audio-based images using self-supervised learning schemes. Through self-supervised learning, convolutional neural networks (CNNs) can learn the general representation of natural images without labels. In this study, a convolutional neural network was pre-trained with natural images (ImageNet) via self-supervised learning, subsequently, it was fine-tuned on the target audio samples. Pre-training with the self-supervised learning scheme significantly improved the sound classification performance when validated on the following benchmarks: ESC-50, UrbanSound8k, and GTZAN. The network pre-trained via self-supervised learning achieved a similar level of accuracy as those pre-trained using a supervised method that require labels. Therefore, we demonstrated that transfer learning from natural images contributes to improvements in audio-related tasks, and self-supervised learning with natural images is adequate for pre-training scheme in terms of simplicity and effectiveness.

Cite

CITATION STYLE

APA

Shin, S., Kim, J., Yu, Y., Lee, S., & Lee, K. (2021). Self-supervised transfer learning from natural images for sound classification. Applied Sciences (Switzerland), 11(7). https://doi.org/10.3390/app11073043

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free