Group activity recognition plays a fundamental role in a variety of applications, e.g. sports video analysis and intelligent surveillance. How to model the spatio-temporal contextual information in a scene still remains a crucial yet challenging issue. We propose a novel attentive semantic recurrent neural network (RNN), dubbed as stagNet, for understanding group activities in videos, based on the spatio-temporal attention and semantic graph. A semantic graph is explicitly modeled to describe the spatial context of the whole scene, which is further integrated with the temporal factor via structural-RNN. Benefiting from the ‘factor sharing’ and ‘message passing’ mechanisms, our model is capable of extracting discriminative spatio-temporal features and capturing inter-group relationships. Moreover, we adopt a spatio-temporal attention model to attend to key persons/frames for improved performance. Two widely-used datasets are employed for performance evaluation, and the extensive results demonstrate the superiority of our method.
CITATION STYLE
Qi, M., Qin, J., Li, A., Wang, Y., Luo, J., & Van Gool, L. (2018). stagNet: An attentive semantic RNN for group activity recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11214 LNCS, pp. 104–120). Springer Verlag. https://doi.org/10.1007/978-3-030-01249-6_7
Mendeley helps you to discover research relevant for your work.