An Improved Attention-Based Spatiotemporal-Stream Model for Action Recognition in Videos

7Citations
Citations of this article
13Readers
Mendeley users who have this article in their library.

Abstract

Action recognition is an important yet challenging task in computer vision. Attention mechanism not only tells where to focus but when to focus. It plays a key role in extracting discriminative spatial and temporal features for solving the task. In this paper, we propose an improved spatiotemporal attention model based on the two-stream structure to recognize the different actions in videos. Specifically, we first extract the intra-frame spatial features and inter-frame optical flow features for each video data. Then we implement an effective attention module, which sequentially infers attention maps along three separate dimensions: channel, spatial and temporal. After adaptive feature refinement based on the attention maps, we perform a temporal pooling process to squeeze the temporal dimension. Then, these achieved spatial and temporal features are fed into the spatial LSTM and temporal LSTM, respectively. Finally, we fuse the spatial feature, temporal feature and two-stream fusion feature to classify the actions in videos. Additionally, we also collect and construct a new Ping-Pong action dataset for subsequent human-robot interaction task from YouTube. It contains 2400 labeled videos for 4 categories. We compare with other action recognition algorithms and validate the feasibility and effectiveness of the proposed method on Ping-Pong action dataset and HMDB51 dataset.

Cite

CITATION STYLE

APA

Liu, D., Ji, Y., Ye, M., Gan, Y., & Zhang, J. (2020). An Improved Attention-Based Spatiotemporal-Stream Model for Action Recognition in Videos. IEEE Access, 8, 61462–61470. https://doi.org/10.1109/ACCESS.2020.2983355

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free