The use of surveillance cameras has made it possible to analyze a huge amount of data for automated surveillance. The use of security systems in schools, hotels, hospitals, and other security areas is required to identify the violent activities that can cause social, economic, and environmental damage. Detecting the mobile objects on each frame is a fundamental phase in the analysis of the video trail and the violence recognition. Therefore, a three-step approach is presented in this article. In our method, the separation of the frames containing the motion information and the detection of the violent behavior are applied at two levels of the network. First, the people in the video frames are identified by using a convolutional neural network. In the second step, a sequence of 16 frames containing the identified people is injected into the 3D CNN. Furthermore, we optimize the 3D CNN by using the visual inference and then a neural network optimization tool that transforms the pre-trained model into an average representation. Finally, this method uses the toolbox of OPENVINO to perform the optimization operations to increase the performance. To evaluate the accuracy of our algorithm, two datasets have been analyzed, which are: Violence in Movies and Hockey Fight. The results show that the final accuracy of this analysis is equal to 99.9% and 96% from each dataset.
CITATION STYLE
Xu, X., Liao, Z., & Xu, Z. (2023). Violent Physical Behavior Detection using 3D Spatio-Temporal Convolutional Neural Networks. International Journal of Advanced Computer Science and Applications, 14(8), 829–836. https://doi.org/10.14569/IJACSA.2023.0140891
Mendeley helps you to discover research relevant for your work.