Scattering wavelet hash fingerprints for musical audio recognition

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

Fingerprint design is the cornerstone of the audio recognition systems in which aims robustness and fast retrieval. Short-term Fourier transform and Mel-spectral representations are common for the task in mind, however these extraction methods suffer from being unstable and having limited spectral-spatial resolution. Scattering wavelet transform (SWT) provides another approach to these limitations by recovering information loss, while ensuring translation invariance and stability. We propose a two-stage feature extraction framework using SWT coupled with deep Siamese hashing model for musical audio recognition. Similarity-preserving hashes are the final fingerprints and in the projected embedding space, similarity is defined by a distance metric. Hashing model is trained by roughly aligned and non-matching audio snippets to model musical audio data via two-layer scattering spectrum. Our proposed framework provides competitive performance results to identify audio signals superimposed with environmental noise which can be modeled as real-world obstacles for music recognition. With a very compact storage footprint (256 bytes/sec.), we achieve 98.2% ROC AUC score on GTZAN dataset.

Cite

CITATION STYLE

APA

Kanalici, E., & Bilgin, G. (2019). Scattering wavelet hash fingerprints for musical audio recognition. International Journal of Innovative Technology and Exploring Engineering, 8(9 Special Issue), 1011–1015. https://doi.org/10.35940/ijitee.I1162.0789S19

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free