At the speed of sound: Efficient audio scene classification

7Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Efficient audio scene classification is essential for smart sensing platforms such as robots, medical monitoring, surveillance, or autonomous vehicles. We propose a retrieval-based scene classification architecture that combines recurrent neural networks and attention to compute embeddings for short audio segments. We train our framework using a custom audio loss function that captures both the relevance of audio segments within a scene and that of sound events within a segment. Using experiments on real audio scenes, we show that we can discriminate audio scenes with high accuracy after listening in for less than a second. This preserves 93% of the detection accuracy obtained after hearing the entire scene.

Cite

CITATION STYLE

APA

Dong, B., Lumezanu, C., Chen, Y., Song, D., Mizoguchi, T., Chen, H., & Khan, L. (2020). At the speed of sound: Efficient audio scene classification. In ICMR 2020 - Proceedings of the 2020 International Conference on Multimedia Retrieval (pp. 301–305). Association for Computing Machinery. https://doi.org/10.1145/3372278.3390730

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free