Video Relation Detection Via Multiple Hypothesis Association

34Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Video visual relation detection (VidVRD) aims at obtaining not only the trajectories of objects but also the dynamic visual relations between them. It provides abundant information for video understanding and can serve as a bridge between vision and language. Compared with visual relation detection on image, VidVRD requires one more step at last called visual relation association which associates relation segments across time dimension into video relations. This step plays an important role in the task but is less studied. Nevertheless, visual relation association is a difficult task as the association process is easily affected by inaccurate tracklet detection and relation prediction in the former steps. In this paper, we propose a novel relation association method called Multiple Hypothesis Association (MHA). It maintains multiple possible relation hypothesis during the association process in order to tolerate and handle the inaccurate or missing problem in the former steps and generate more accurate video relations. Our experiments on the benchmark datasets (Imagenet-VidVRD and VidOR) show that our method outperforms the state-of-the-art methods.

Cite

CITATION STYLE

APA

Su, Z., Shang, X., Chen, J., Jiang, Y. G., Qiu, Z., & Chua, T. S. (2020). Video Relation Detection Via Multiple Hypothesis Association. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 3127–3135). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413764

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free