Abstract
Fingerprint design is the cornerstone of the audio recognition systems in which aims robustness and fast retrieval. Short-term Fourier transform and Mel-spectral representations are common for the task in mind, however these extraction methods suffer from being unstable and having limited spectral-spatial resolution. Scattering wavelet transform (SWT) provides another approach to these limitations by recovering information loss, while ensuring translation invariance and stability. We propose a two-stage feature extraction framework using SWT coupled with deep Siamese hashing model for musical audio recognition. Similarity-preserving hashes are the final fingerprints and in the projected embedding space, similarity is defined by a distance metric. Hashing model is trained by roughly aligned and non-matching audio snippets to model musical audio data via two-layer scattering spectrum. Our proposed framework provides competitive performance results to identify audio signals superimposed with environmental noise which can be modeled as real-world obstacles for music recognition. With a very compact storage footprint (256 bytes/sec.), we achieve 98.2% ROC AUC score on GTZAN dataset.
Author supplied keywords
Cite
CITATION STYLE
Kanalici, E., & Bilgin, G. (2019). Scattering wavelet hash fingerprints for musical audio recognition. International Journal of Innovative Technology and Exploring Engineering, 8(9 Special Issue), 1011–1015. https://doi.org/10.35940/ijitee.I1162.0789S19
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.