Surveillance cameras are increasingly being used worldwide due to the proliferation of digital video capturing, storage, and processing technologies. However, the large volume of video data generated makes it difficult for humans to perform real-time analysis, and even manual approaches can result in delayed detection of events. Automatic violence detection in surveillance footage has therefore gained significant attention in the scientific community as a way to address this challenge. With the advancement of machine learning algorithms, automatic video recognition tasks such as violence detection have become increasingly feasible. In this study, we investigate the use of smart networks that model the dynamic relationships between actors and/or objects using 3D convolutions to capture both the spatial and temporal structure of the data. We also leverage the knowledge learned by a pre-trained action recognition model for efficient and accurate violence detection in surveillance footage. We extend and evaluate several public datasets featuring diverse and challenging video content to assess the effectiveness of our proposed methods. Our results show that our approach outperforms state-of-the-art methods, achieving approximately a 2% improvement in accuracy with fewer model parameters. Additionally, our experiments demonstrate the robustness of our approach under common compression artifacts encountered in remote server processing applications.
CITATION STYLE
Huszar, V. D., Adhikarla, V. K., Negyesi, I., & Krasznay, C. (2023). Toward Fast and Accurate Violence Detection for Automated Video Surveillance Applications. IEEE Access, 11, 18772–18793. https://doi.org/10.1109/ACCESS.2023.3245521
Mendeley helps you to discover research relevant for your work.