Asymmetric Relation Consistency Reasoning for Video Relation Grounding

1Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Video relation grounding has attracted growing attention in the fields of video understanding and multimodal learning. While the past years have witnessed remarkable progress in this issue, the difficulties of multi-instance and complex temporal reasoning make it still a challenging task. In this paper, we propose a novel Asymmetric Relation Consistency (ARC) reasoning model to solve the video relation grounding problem. To overcome the multi-instance confusion problem, an asymmetric relation reasoning method and a novel relation consistency loss are proposed to ensure the consistency of the relationships across multiple instances. In order to precisely localize the relation instance in temporal context, a transformer-based relation reasoning module is proposed. Our model is trained in a weakly-supervised manner. The proposed method was tested on the challenging video relation dataset. Experiments manifest that the performance of our method outperforms the state-of-the-art methods by a large margin. Extensive ablation studies also prove the effectiveness and strength of the proposed method.

Cite

CITATION STYLE

APA

Li, H., Wei, P., Li, J., Ma, Z., Shang, J., & Zheng, N. (2022). Asymmetric Relation Consistency Reasoning for Video Relation Grounding. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13695 LNCS, pp. 125–141). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-19833-5_8

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free