Improving Classification of Basic Spatial Audio Scenes in Binaural Recordings of Music by Deep Learning Approach

3Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The paper presents a deep learning algorithm for the automatic classification of basic spatial audio scenes in binaural music recordings. In the proposed method, the binaural audio recordings are initially converted to Mel-spectrograms, and subsequently classified using the convolutional neural network. The proposed method reached an accuracy of 87%, which constitutes a 10% improvement over the results reported in the literature. The method is capable of delivering moderate levels of classification accuracy even when single-channel spectrograms are directed to its input (e.g. solely from the left “ear”), highlighting the importance of monaural cues in spatial perception. The obtained results emphasize the significance of including multiple frequency bands in the convolution process. Visual inspection of the convolution filter activations reveals that the network performs a complex spectro-temporal sound decomposition, likely including a form of a foreground audio content separation from its background constituents.

Cite

CITATION STYLE

APA

Zieliński, S. K. (2020). Improving Classification of Basic Spatial Audio Scenes in Binaural Recordings of Music by Deep Learning Approach. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12133 LNCS, pp. 291–303). Springer. https://doi.org/10.1007/978-3-030-47679-3_25

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free