Many real-world time series analysis problems are characterized by low signal-to-noise ratios and compounded by scarce data. Solutions to these types of problems often rely on handcrafted features extracted in the time or frequency domain. Recent high-profile advances in deep learning have improved performance across many application domains; however, they typically rely on large data sets that may not always be available. This paper presents an application of deep learning for acoustic event detection in a challenging, data-scarce, real-world problem. We show that convolutional neural networks (CNNs), operating on wavelet transformations of audio recordings, demonstrate superior performance over conventional classifiers that utilize handcrafted features. Our key result is that wavelet transformations offer a clear benefit over the more commonly used short-time Fourier transform. Furthermore, we show that features, handcrafted for a particular dataset, do not generalize well to other datasets. Conversely, CNNs trained on generic features are able to achieve comparable results across multiple datasets, along with outperforming human labellers. We present our results on the application of both detecting the presence of mosquitoes and the classification of bird species.
CITATION STYLE
Kiskin, I., Zilli, D., Li, Y., Sinka, M., Willis, K., & Roberts, S. (2020). Bioacoustic detection with wavelet-conditioned convolutional neural networks. Neural Computing and Applications, 32(4), 915–927. https://doi.org/10.1007/s00521-018-3626-7
Mendeley helps you to discover research relevant for your work.