Cross-modality video segment retrieval with ensemble learning

Xinyan Yu; Ya Zhang; Rui Zhang

Book Chapter

Cross-modality video segment retrieval with ensemble learning

Springer International Publishing, (2020), 65-79

DOI: 10.1007/978-3-030-30671-7_5

0Citations

1Readers

Get full text

Abstract

Jointly modeling vision and language is a new research area which has many applications, such as video segment retrieval and video dense caption. Compared with video language retrieval, video segment retrieval is a novel task that uses natural language to retrieve a specific video segment from the whole video. One common method is to learn a similarity metric between video and language features. In this chapter, we utilize ensemble learning method to learn a video segment retrieval model. Our ensemble model aims to combine each single-stream model to learn a better similarity metric. We evaluate our method on the task of the video clip retrieval with the new proposed Distinct Describable Moments dataset. Extensive experiments have shown that our approach achieves improvement compared with the result of the state-of-art.

Author supplied keywords

Cite

CITATION STYLE

APA

Yu, X., Zhang, Y., & Zhang, R. (2020). Cross-modality video segment retrieval with ensemble learning. In Domain Adaptation for Visual Understanding (pp. 65–79). Springer International Publishing. https://doi.org/10.1007/978-3-030-30671-7_5

Cross-modality video segment retrieval with ensemble learning

Abstract

Author supplied keywords

Cite

Register to see more suggestions