WSLLN: Weakly supervised natural language localization networks

Mingfei Gao; Larry S. Davis; Richard Socher; Caiming Xiong

Conference ProceedingsOPEN ACCESS

WSLLN: Weakly supervised natural language localization networks

EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (2019) 1481-1487

DOI: 10.18653/v1/d19-1157

44Citations

109Readers

Abstract

We propose weakly supervised language localization networks (WSLLN) to detect events in long, untrimmed videos given language queries. To learn the correspondence between visual segments and texts, most previous methods require temporal coordinates (start and end times) of events for training, which leads to high costs of annotation. WSLLN relieves the annotation burden by training with only video-sentence pairs without accessing to temporal locations of events. With a simple end-to-end structure, WSLLN measures segment-text consistency and conducts segment selection (conditioned on the text) simultaneously. Results from both are merged and optimized as a video-sentence matching problem. Experiments on ActivityNet Captions and DiDeMo demonstrate that WSLLN achieves state-of-the-art performance.

Cite

CITATION STYLE

APA

Gao, M., Davis, L. S., Socher, R., & Xiong, C. (2019). WSLLN: Weakly supervised natural language localization networks. In EMNLP-IJCNLP 2019 - 2019 Conference on Empirical Methods in Natural Language Processing and 9th International Joint Conference on Natural Language Processing, Proceedings of the Conference (pp. 1481–1487). Association for Computational Linguistics. https://doi.org/10.18653/v1/d19-1157

WSLLN: Weakly supervised natural language localization networks

Abstract

Cite

Register to see more suggestions