On learning disentangled representation for acoustic event detection

Lijian Gao; Qirong Mao; Ming Dong; Yu Jing; Ratna Chinnam

Conference ProceedingsOPEN ACCESS

On learning disentangled representation for acoustic event detection

MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (2019) 2006-2014

DOI: 10.1145/3343031.3351086

7Citations

15Readers

Abstract

Polyphonic Acoustic Event Detection (AED) is a challenging task as the sounds are mixed with the signals from different events, and the features extracted from the mixture do not match well with features calculated from sounds in isolation, leading to suboptimal AED performance. In this paper, we propose a supervised β-VAE model for AED, which adds a novel event-specific disentangling loss in the objective function of disentangled learning. By incorporating either latent factor blocks or latent attention in disentangling, supervised β-VAE learns a set of discriminative features for each event. Extensive experiments on benchmark datasets show that our approach outperforms the current state-of-the-arts (top-1 performers in the Detection and Classification of Acoustic Scenes and Events (DCASE) 2017 AED challenge). Supervised β-VAE has great success in challenging AED tasks with a large variety of events and imbalanced data.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, L., Mao, Q., Dong, M., Jing, Y., & Chinnam, R. (2019). On learning disentangled representation for acoustic event detection. In MM 2019 - Proceedings of the 27th ACM International Conference on Multimedia (pp. 2006–2014). Association for Computing Machinery, Inc. https://doi.org/10.1145/3343031.3351086

On learning disentangled representation for acoustic event detection

Abstract

Author supplied keywords

Cite

Register to see more suggestions