Violence detection based on spatio-temporal feature and fisher vector

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A novel framework based on local spatio-temporal features and a Bag-of-Words (BoW) model is proposed for violence detection. The framework utilizes Dense Trajectories (DT) and MPEG flow video descriptor (MF) as feature descriptors and employs Fisher Vector (FV) in feature coding. DT and MF algorithms are more descriptive and robust, because they are combinations of various feature descriptors, which describe trajectory shape, appearance, motion and motion boundary, respectively. FV is applied to transform low level features to high level features. FV method preserves much information, because not only the affiliations of descriptors are found in the codebook, but also the first and second order statistics are used to represent videos. Some tricks, that PCA, K-means++ and codebook size, are used to improve the final performance of video classification. In comprehensive consideration of accuracy, speed and application scenarios, the proposed method for violence detection is analysed. Experimental results show that the proposed approach outperforms the state-of-the-art approaches for violence detection in both crowd scenes and non-crowd scenes.

Cite

CITATION STYLE

APA

Cai, H., Jiang, H., Huang, X., Yang, J., & He, X. (2018). Violence detection based on spatio-temporal feature and fisher vector. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11256 LNCS, pp. 180–190). Springer Verlag. https://doi.org/10.1007/978-3-030-03398-9_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free