Activity Image-to-Video Retrieval by Disentangling Appearance and Motion

11Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

With the rapid emergence of video data, image-to-video retrieval has attracted much attention. There are two types of image-to-video retrieval: instance-based and activity-based. The former task aims to retrieve videos containing the same main objects as the query image, while the latter focuses on finding the similar activity. Since dynamic information plays a significant role in the video, we pay attention to the latter task to explore the motion relation between images and videos. In this paper, we propose a Motion-assisted Activity Proposal-based Image-to-Video Retrieval (MAP-IVR) approach to disentangle the video features into motion features and appearance features and obtain appearance features from the images. Then, we perform image-to-video translation to improve the disentanglement quality. The retrieval is performed in both appearance and video feature spaces. Extensive experiments demonstrate that our MAP-IVR approach remarkably outperforms the state-of-the-art approaches on two benchmark activity-based video datasets.

Cite

CITATION STYLE

APA

Liu, L., Li, J., Niu, L., Xu, R., & Zhang, L. (2021). Activity Image-to-Video Retrieval by Disentangling Appearance and Motion. In 35th AAAI Conference on Artificial Intelligence, AAAI 2021 (Vol. 3A, pp. 2145–2153). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/aaai.v35i3.16312

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free