Violence detection in videos has numerous applications, ranging from parental control and children protection to multimedia filtering and retrieval. A number of approaches have been proposed to detect vital clues for violent actions, among which most methods prefer employing trajectory based action recognition techniques. However, these methods can only model general characteristics of human actions, thus cannot well capture specific high order information of violent actions. Therefore, they are not suitable for detecting violence, which is typically intense and correlated with specific scenes. In this paper, we propose a novel framework, i.e., multi-stream deep convolutional neural networks, for person to person violence detection in videos. In addition to conventional spatial and temporal streams, we develop an acceleration stream to capture the important intense information usually involved in violent actions. Moreover, a simple and effective score-level fusion strategy is proposed to integrate multi-stream information. We demonstrate the effectiveness of our method on the typical violence dataset and extensive experimental results show its superiority over state-of-the-art methods.
CITATION STYLE
Dong, Z., Qin, J., & Wang, Y. (2016). Multi-stream deep networks for person to person violence detection in videos. In Communications in Computer and Information Science (Vol. 662, pp. 517–531). Springer Verlag. https://doi.org/10.1007/978-981-10-3002-4_43
Mendeley helps you to discover research relevant for your work.