A Deep Attention Model for Environmental Sound Classification from Multi-Feature Data

16Citations
Citations of this article
19Readers
Mendeley users who have this article in their library.

Abstract

Automated environmental sound recognition has clear engineering benefits; it allows audio to be sorted, curated, and searched. Unlike music and language, environmental sound is loaded with noise and lacks the rhythm and melody of music or the semantic sequence of language, making it difficult to find common features representative enough of various environmental sound signals. To improve the accuracy of environmental sound recognition, this paper proposes a recognition method based on multi-feature parameters and time–frequency attention module. It begins with a pretreatment that relies on multi-feature parameters to extract the sound, which supplements the phase information lost by the Log-Mel spectrogram in the current mainstream methods, and en-hances the expressive ability of input features. A time–frequency attention module with multiple convolutions is designed to extract the attention weight of the input feature spectrogram and reduce the interference coming from the background noise and irrelevant frequency bands in the audio. Comparative experiments were conducted on three general datasets: environmental sound classification datasets (ESC-10, ESC-50) and an UrbanSound8K dataset. Experiments demonstrated that the proposed method performs better.

Cite

CITATION STYLE

APA

Guo, J., Li, C., Sun, Z., Li, J., & Wang, P. (2022). A Deep Attention Model for Environmental Sound Classification from Multi-Feature Data. Applied Sciences (Switzerland), 12(12). https://doi.org/10.3390/app12125988

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free