Dictionary learning based sparse coefficients for audio classification with max and average pooling

62Citations
Citations of this article
46Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Audio classification is an important problem in signal processing and pattern recognition with potential applications in audio retrieval, documentation and scene analysis. Common to general signal classification systems, it involves both training and classification (or testing) stages. The performance of an audio classification system, such as its complexity and classification accuracy, depends highly on the choice of the signal features and the classifiers. Several features have been widely exploited in existing methods, such as the mel-frequency cepstrum coefficients (MFCCs), line spectral frequencies (LSF) and short time energy (STM). In this paper, instead of using these well-established features, we explore the potential of sparse features, derived from the dictionary of signal atoms using sparse coding based on e.g. orthogonal matching pursuit (OMP), where the atoms are adapted directly from audio training data using the K-SVD dictionary learning algorithm. To reduce the computational complexity, we propose to perform pooling and sampling operations on the sparse coefficients. Such operations also help to maintain a unified dimension of the signal features, regardless of the various lengths of the training and testing signals. Using the popular support vector machine (SVM) as the classifier, we examine the performance of the proposed classification system for two binary classification problems, namely speech-music classification and male-female speech discrimination and a multi-class problem, speaker identification. The experimental results show that the sparse (max-pooled and average-pooled) coefficients perform better than the classical MFCCs features, in particular, for noisy audio data. © 2013 Elsevier Inc.

Cite

CITATION STYLE

APA

Zubair, S., Yan, F., & Wang, W. (2013). Dictionary learning based sparse coefficients for audio classification with max and average pooling. Digital Signal Processing: A Review Journal, 23(3), 960–970. https://doi.org/10.1016/j.dsp.2013.01.004

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free