On learning disentangled representation for acoustic event detection

7Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Polyphonic Acoustic Event Detection (AED) is a challenging task as the sounds are mixed with the signals from different events, and the features extracted from the mixture do not match well with features calculated from sounds in isolation, leading to suboptimal AED performance. In this paper, we propose a supervised β-VAE model for AED, which adds a novel event-specific disentangling loss in the objective function of disentangled learning. By incorporating either latent factor blocks or latent attention in disentangling, supervised β-VAE learns a set of discriminative features for each event. Extensive experiments on benchmark datasets show that our approach outperforms the current state-of-the-arts (top-1 performers in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 AED challenge). Supervised β-VAE has great success in challenging AED tasks with a large variety of events and imbalanced data.

Cite

CITATION STYLE

APA

Gao, L., Mao, Q., Dong, M., Jing, Y., & Chinnam, R. (2019). On learning disentangled representation for acoustic event detection. In MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (pp. 2006–2014). Association for Computing Machinery, Inc. https://doi.org/10.1145/3343031.3351086

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free