Multi-channel convolutional neural networks with multi-level feature fusion for environmental sound classification

Dading Chong; Yuexian Zou; Wenwu Wang

Conference Proceedings

Multi-channel convolutional neural networks with multi-level feature fusion for environmental sound classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2019) 11296 LNCS 157-168

DOI: 10.1007/978-3-030-05716-9_13

10Citations

15Readers

Get full text

Abstract

Learning acoustic models directly from the raw waveform is an effective method for Environmental Sound Classification (ESC) where sound events often exhibit vast diversity in temporal scales. Convolutional neural networks (CNNs) based ESC methods have achieved the state-of-the-art results. However, their performance is affected significantly by the number of convolutional layers used and the choice of the kernel size in the first convolutional layer. In addition, most existing studies have ignored the ability of CNNs to learn hierarchical features from environmental sounds. Motivated by these findings, in this paper, parallel convolutional filters with different sizes in the first convolutional layer are designed to extract multi-time resolution features aiming at enhancing feature representation. Inspired by VGG networks, we build our deep CNNs by stacking 1-D convolutional layers using very small filters except for the first layer. Finally, we extend the model using multi-level feature aggregation technique to boost the classification performance. The experimental results on Urbansound 8k, ESC-50, and ESC-10 show that our proposed method outperforms the state-of-the-art end-to-end methods for environmental sound classification in terms of the classification accuracy.

Author supplied keywords

Cite

CITATION STYLE

APA

Chong, D., Zou, Y., & Wang, W. (2019). Multi-channel convolutional neural networks with multi-level feature fusion for environmental sound classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11296 LNCS, pp. 157–168). Springer Verlag. https://doi.org/10.1007/978-3-030-05716-9_13

Multi-channel convolutional neural networks with multi-level feature fusion for environmental sound classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions