Multimodal video concept detection via bag of auditory words and multiple kernel learning

Markus Mühling; Ralph Ewerth; Jun Zhou; Bernd Freisleben

Conference Proceedings

Multimodal video concept detection via bag of auditory words and multiple kernel learning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7131 LNCS 40-50

DOI: 10.1007/978-3-642-27355-1_7

14Citations

15Readers

Get full text

Abstract

State-of-the-art systems for video concept detection mainly rely on visual features. Some previous approaches have also included audio features, either using low-level features such as mel-frequency cepstral coefficients (MFCC) or exploiting the detection of specific audio concepts. In this paper, we investigate a bag of auditory words (BoAW) approach that models MFCC features in an auditory vocabulary. The resulting BoAW features are combined with state-of-the-art visual features via multiple kernel learning (MKL). Experiments on a large set of 101 video concepts from the MediaMill Challenge show the effectiveness of using BoAW features: The system using BoAW features and a support vector machine with a χ 2-kernel is superior to a state-of-the-art audio approach relying on probabilistic latent semantic indexing. Furthermore, it is shown that an early fusion approach degrades detection performance, whereas the combination of auditory and visual bag of words features via MKL yields a relative performance improvement of 9%. © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Mühling, M., Ewerth, R., Zhou, J., & Freisleben, B. (2012). Multimodal video concept detection via bag of auditory words and multiple kernel learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7131 LNCS, pp. 40–50). https://doi.org/10.1007/978-3-642-27355-1_7

Multimodal video concept detection via bag of auditory words and multiple kernel learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions