Activity-driven Weakly-Supervised Spatio-Temporal Grounding from Untrimmed Videos

Junwen Chen; Wentao Bao; Yu Kong

Conference ProceedingsOPEN ACCESS

Activity-driven Weakly-Supervised Spatio-Temporal Grounding from Untrimmed Videos

MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (2020) 3789-3797

DOI: 10.1145/3394171.3413614

11Citations

16Readers

Get full text

Abstract

In this paper, we study the problem of weakly-supervised spatio-temporal grounding from raw untrimmed video streams. Given a video and its descriptive sentence, spatio-temporal grounding aims at predicting the temporal occurrence and spatial locations of each query object across frames. Our goal is to learn a grounding model in a weakly-supervised fashion, without the supervision of both spatial bounding boxes and temporal occurrences during training. Existing methods have been addressed in trimmed videos, but their reliance on object tracking will easily fail due to frequent camera shot cut in untrimmed videos. To this end, we propose a novel spatio-temporal multiple instance learning framework for untrimmed video grounding. Spatial MIL and temporal MIL are mutually guided to ground each query to specific spatial regions and the occurring frames of a video. Furthermore, an activity described in the sentence is captured to use the informative contextual cues for region proposals refinement and text representation. We conduct extensive evaluation on YouCookII and RoboWatch datasets, and demonstrate our method outperforms state-of-the-art methods.

Author supplied keywords

Cite

CITATION STYLE

APA

Chen, J., Bao, W., & Kong, Y. (2020). Activity-driven Weakly-Supervised Spatio-Temporal Grounding from Untrimmed Videos. In MM 2020 - Proceedings of the 28th ACM International Conference on Multimedia (pp. 3789–3797). Association for Computing Machinery, Inc. https://doi.org/10.1145/3394171.3413614

Activity-driven Weakly-Supervised Spatio-Temporal Grounding from Untrimmed Videos

Abstract

Author supplied keywords

Cite

Register to see more suggestions