ASTA-Net: Adaptive Spatio-Temporal Attention Network for Person Re-Identification in Videos

14Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The attention mechanism has been widely applied to enhance pedestrian representation for person re-identification in videos. However, most existing methods learn the spatial and temporal attention separately, and thus ignore the correlation between them. In this work, we propose a novel Adaptive Spatio-Temporal Attention Network (ASTA-Net) to adaptively aggregate the spatial and temporal attention features into discriminative pedestrian representation for person re-identification in videos. Specifically, multiple Adaptive Spatio-Temporal Fusion modules within ASTA-Net are designed for exploring precise spatio-temporal attention on multi-level feature maps. They first obtain the preliminary spatial and temporal attention features via the spatial semantic relations for each frame and temporal dependencies among inconsecutive frames, then adaptively aggregate the preliminary attention features on the basis of their correlation. Moreover, an Adjacent-Frame Motion module is designed to explicitly extract motion patterns according to the feature-level variation among adjacent frames. Extensive experiments on the three widely-used datasets, i.e., MARS, iLIDS-VID and PRID2011, have demonstrated the effectiveness of the proposed approach.

Cite

CITATION STYLE

APA

Zhu, X., Liu, J., Wu, H., Wang, M., & Zha, Z. J. (2020). ASTA-Net: Adaptive Spatio-Temporal Attention Network for Person Re-Identification in Videos. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 1706–1715). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413843

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free