Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities

Ping Wang; Li Sun; Liuan Wang; Jun Sun

Journal ArticleOPEN ACCESS

Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities

Sustainability (Switzerland) (2023) 15(1)

DOI: 10.3390/su15010153

1Citations

7Readers

Abstract

Automatic video understanding is a crucial piece of technology which promotes urban sustainability. Video grounding is a fundamental component of video understanding that has been evolving quickly in recent years, but its use is restricted due to the high labeling costs and typical performance limitations imposed by the pre-defined training dataset. In this paper, a novel atom-based zero-shot video grounding (AZVG) method is proposed to retrieve the segments in the video that correspond to a given input sentence. Although it is training-free, the performance of AZVG is competitive to the weakly supervised methods and better than unsupervised SOTA methods on the Charades-STA dataset. The method can support flexible queries as well as different video content. It can play an important role in a wider range of urban living applications.

Author supplied keywords

Cite

CITATION STYLE

APA

Wang , P., Sun, L., Wang, L., & Sun, J. (2023). Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities. Sustainability (Switzerland), 15(1). https://doi.org/10.3390/su15010153

Zero-Shot Video Grounding for Automatic Video Understanding in Sustainable Smart Cities

Abstract

Author supplied keywords

Cite

Register to see more suggestions