A Comparison on Data Augmentation Methods Based on Deep Learning for Audio Classification

61Citations
Citations of this article
110Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Deep learning focuses on the representation of the input data and generalization of the model. It is well known that data augmentation can combat overfitting and improve the generalization ability of deep neural network. In this paper, we summarize and compare multiple data augmentation methods for audio classification. These strategies include traditional methods on raw audio signal, as well as the current popular augmentation of linear interpolation and nonlinear mixing on the spectrum. We explore the generation of new samples, the transformation of labels, and the combination patterns of samples and labels of each data augmentation method. Finally, inspired by SpecAugment and Mixup, we propose an effective and easy to implement data augmentation method, which we call Mixed frequency Masking data augmentation. This method adopts nonlinear combination method to construct new samples and linear method to construct labels. All methods are verified on the Freesound Dataset Kaggle2018 dataset, and ResNet is adopted as the classifier. The baseline system uses the log-mel spectrogram feature as the input. We use mean Average Precision @3 (mAP@3) as the evaluation metric to evaluate the performance of all data augmentation methods.

Cite

CITATION STYLE

APA

Wei, S., Zou, S., Liao, F., & Lang, W. (2020). A Comparison on Data Augmentation Methods Based on Deep Learning for Audio Classification. In Journal of Physics: Conference Series (Vol. 1453). Institute of Physics Publishing. https://doi.org/10.1088/1742-6596/1453/1/012085

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free