Abstract
An automated detection of aggressive and violent behaviour in videos has immense potential. It enables efficient online content filtering by restricting access to extreme content and also, when integrated with security systems, helps to monitor violence in surveillance videos. In this work, a convolutional neural network is combined with the proposed Spatial and Channel wise Attention-based ConvLSTM encoder (SCan-ConvLSTM). The proposed architecture performs an efficient spatiotemporal fusion of the features extracted from the video sequences containing fight scenes. In order to focus selectively on regions of utmost importance, this blended attention mechanism adjusts the weights of outputs in different locations and across different channels. This recurrent attention mechanism enhances the sequential refinement of activation maps and boosts the model performance. Finally, the experimental results have been presented that show the proposed architecture achieves superior results on the benchmark datasets (RWF-2000, Violent-flow, Hockey-fights, and Movies).
Author supplied keywords
Cite
CITATION STYLE
Chaturvedi, K., Dhiman, C., & Vishwakarma, D. K. (2024). Fight detection with spatial and channel wise attention-based ConvLSTM model. Expert Systems, 41(1). https://doi.org/10.1111/exsy.13474
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.