Human action recognition in videos is an important task with a broad range of applications. In this study, we improve the performance of recurrent attention convolutional neural network (RACNN) by proposing a novel model, "attention-again". We consider the nature of video frames as sequences, which will cause the change of regions of interest in the frame, thus we cannot use an attention mechanism similar to that in images. "Attention-again" model is a variant from traditional attention model for recognizing human activities and is embedded in two long short-term memory (LSTM) layers. Different from hierarchal LSTM which change the LSTM structure to combine the hidden states from two LSTM layers, our proposals introduce "attention-again" model to avoid the change of LSTM structure. Furthermore, this model not only learns the relations in each frame, but also obtains the relations among all frames, and these relations instruct the next learning stage. Therefore, our proposed model outperform the baseline and is superior to methods with the same experimental conditions on three benchmark datasets: UCF-11, HMDB-51 and UCF-101. To understand how the model works, we also visualize the region of interest in the frame.
CITATION STYLE
Yang, H., Zhang, J., Li, S., Lei, J., & Chen, S. (2018). Attend it again: Recurrent attention convolutional neural network for action recognition. Applied Sciences (Switzerland), 8(3). https://doi.org/10.3390/app8030383
Mendeley helps you to discover research relevant for your work.