A proposal-based approach for activity image-to-video retrieval

16Citations
Citations of this article
28Readers
Mendeley users who have this article in their library.

Abstract

Activity image-to-video retrieval task aims to retrieve videos containing the similar activity as the query image, which is a challenging task because videos generally have many background segments irrelevant to the activity. In this paper, we utilize R-C3D model to represent a video by a bag of activity proposals, which can filter out background segments to some extent. However, there are still noisy proposals in each bag. Thus, we propose an Activity Proposal-based Image-to-Video Retrieval (APIVR) approach, which incorporates multi-instance learning into cross-modal retrieval framework to address the proposal noise issue. Specifically, we propose a Graph Multi-Instance Learning (GMIL) module with graph convolutional layer, and integrate this module with classification loss, adversarial loss, and triplet loss in our cross-modal retrieval framework. Moreover, we propose geometry-aware triplet loss based on point-to-subspace distance to preserve the structural information of activity proposals. Extensive experiments on three widely-used datasets verify the effectiveness of our approach.

Cite

CITATION STYLE

APA

Xu, R., Niu, L., Zhang, J., & Zhang, L. (2020). A proposal-based approach for activity image-to-video retrieval. In AAAI 2020 - 34th AAAI Conference on Artificial Intelligence (pp. 12524–12531). AAAI press. https://doi.org/10.1609/aaai.v34i07.6941

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free