Action recognition using visual attention with reinforcement learning

9Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Human action recognition in videos is a challenging and significant task with a broad range of applications. The advantage of the visual attention mechanism is that it can effectively reduce noise interference by focusing on the relevant parts of the image and ignoring the irrelevant part. We propose a deep visual attention model with reinforcement learning for this task. We use Recurrent Neural Network (RNN) with Long Short-Term Memory (LSTM) units as a learning agent. The agent interact with video and decides both where to look next frame and where to locate the most relevant region of the selected video frame. REINFORCE method is used to learn the agent’s decision policy and back-propagation method is used to train the action classifier. The experimental results demonstrate that this glimpse window can focus on important clues. Our model achieves significant performance improvement on the action recognition datasets: UCF101 and HMDB51.

Cite

CITATION STYLE

APA

Li, H., Chen, J., Hu, R., Yu, M., Chen, H., & Xu, Z. (2019). Action recognition using visual attention with reinforcement learning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11296 LNCS, pp. 365–376). Springer Verlag. https://doi.org/10.1007/978-3-030-05716-9_30

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free