Effective modeling and analyzing of an auditory scene is crucial to many context-aware and content-based multimedia applications. In this paper, we explore the effectiveness of the multiple-layer generative deep neural network model in discovering the underlying higher level and highly non-linear probabilistic representations from acoustic data of the unstructured auditory scenes. We first create a more compact and representative description of the input audio clip by focusing on the salient regions of data and modeling their contextual correlations. Next, we exploit deep belief network (DBN) to unsupervisedly discover and generate the high-level descriptions of scene audio as the activations of units on higher hidden layers of the trained DBN model, which are finally classified to certain category of scene by either the discriminative output layer of DBN or a separate classifier like support vector machine (SVM). The experiment reveals the effectiveness of the proposed DBNbased classification approach for auditory scenes.
CITATION STYLE
Su, F., & Xue, L. (2015). Auditory scene classification with deep belief network. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8935, pp. 348–359). Springer Verlag. https://doi.org/10.1007/978-3-319-14445-0_30
Mendeley helps you to discover research relevant for your work.