In this paper we consider the problem of video-based person re-identification, which is the task of associating videos of the same person captured by different and non-overlapping cameras. We propose a Siamese framework in which video frames of the person to re-identify and of the candidate one are processed by two identical networks which produce a similarity score. We introduce an attention mechanisms to capture the relevant information both at frame level (spatial information) and at video level (temporal information given by the importance of a specific frame within the sequence). One of the novelties of our approach is given by a joint concurrent processing of both frame and video levels, providing in such a way a very simple architecture. Despite this fact, out approach achieves better performance than the state-of-the-art on the challenging iLIDS-VID dataset.
CITATION STYLE
Zamprogno, M., Passon, M., Martinel, N., Serra, G., Lancioni, G., Micheloni, C., … Foresti, G. L. (2019). Video-based convolutional attention for person re-identification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11751 LNCS, pp. 3–14). Springer Verlag. https://doi.org/10.1007/978-3-030-30642-7_1
Mendeley helps you to discover research relevant for your work.