Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval

38Citations
Citations of this article
26Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Due to the popularity of video contents on the Internet, the information retrieval between videos and texts has attracted broad interest from researchers, which is a challenging cross-modal retrieval task. A common solution is to learn a joint embedding space to measure the cross-modal similarity. However, many existing approaches either pay more attention to textual information, video information, or cross-modal matching methods, but less to all three. We believe that a good video-text retrieval system should take into account all three points, fully exploiting the semantic information of both modalities and considering a comprehensive match. In this paper, we propose a Hierarchical Cross-Modal Graph Consistency Learning Network (HCGC) for video-text retrieval task, which considers multi-level graph consistency for video-text matching. Specifically, we first construct a hierarchical graph representation for the video, which includes three levels from global to local: video, clips and objects. Similarly, the corresponding text graph is constructed according to the semantic relationships among sentence, actions and entities. Then, in order to learn a better match between the video and text graph, we design three types of graph consistency (both direct and indirect): inter-graph parallel consistency, inter-graph cross consistency and intra-graph cross consistency. Extensive experimental results on different video-text datasets demonstrate the effectiveness of our approach on both text-to-video and video-to-text retrieval.

Cite

CITATION STYLE

APA

Jin, W., Zhao, Z., Zhang, P., Zhu, J., He, X., & Zhuang, Y. (2021). Hierarchical Cross-Modal Graph Consistency Learning for Video-Text Retrieval. In SIGIR 2021 - Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 1114–1124). Association for Computing Machinery, Inc. https://doi.org/10.1145/3404835.3462974

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free