Acoustic Classification of Bird Species Using an Early Fusion of Deep Features

14Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.

Abstract

Bird sound classification plays an important role in large-scale temporal and spatial environmental monitoring. In this paper, we investigate both transfer learning and training from scratch for bird sound classification, where pre-trained models are used as feature extractors. Specifically, deep cascade features are extracted from various layers of different pre-trained models, which are then fused to classify bird sounds. A multi-view spectrogram is constructed to characterize bird sounds by simply repeating the spectrogram to make it suitable for pre-trained models. Furthermore, both mixup and pitch shift are applied for augmenting bird sounds to improve the classification performance. Experimental classification on 43 bird species using linear SVM indicates that deep cascade features can achieve the highest balanced accuracy of 90.94% ± 1.53%. To further improve the classification performance, an early fusion method is used by combining deep cascaded features extracted from different pre-trained models. The final best classification balanced accuracy is 94.89% ± 1.35%.

Cite

CITATION STYLE

APA

Xie, J., & Zhu, M. (2023). Acoustic Classification of Bird Species Using an Early Fusion of Deep Features. Birds, 4(1), 138–147. https://doi.org/10.3390/birds4010011

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free