Violent scene detection using convolutional neural networks and deep audio features

34Citations
Citations of this article
24Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Violent scene detection (VSD) in videos has practical significance in various applications, such as film rating and child protection against violent behavior. Most of previous VSD systems have mainly used visual cues in the video although acoustic or audio cues can also help to detect violent scenes especially when visual cues are not reliable. In this paper, we focus on exploring acoustic information for violent scene detection. Convolutional Neural Networks (CNNs) have achieved the state-of-the-art performance in visual content processing tasks. We therefore investigate using CNNs for violent scene detection based on acoustic information in videos. We apply CNNs in two ways: as a classifier directly or as a deep acoustic feature extractor. Experimental results on the MediaEval 2015 evaluation dataset show that CNNs are effective both as classifiers and as acoustic feature extractors. Furthermore, fusion of acoustic and visual information significantly improves violent scene detection performance.

Cite

CITATION STYLE

APA

Mu, G., Cao, H., & Jin, Q. (2016). Violent scene detection using convolutional neural networks and deep audio features. In Communications in Computer and Information Science (Vol. 663, pp. 451–461). Springer Verlag. https://doi.org/10.1007/978-981-10-3005-5_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free