Open-ended long-form video question answering via adaptive hierarchical reinforced networks

45Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

Abstract

Open-ended long-form video question answering is challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced long-form video content according to the question. However, the existing video question answering works mainly focus on the short-form video question answering, due to the lack of modeling the semantic representation of long-form video contents. In this paper, we consider the problem of long-form video question answering from the viewpoint of adaptive hierarchical reinforced encoder-decoder network learning. We propose the adaptive hierarchical encoder network to learn the joint representation of the longform video contents according to the question with adaptive video segmentation. we then develop the reinforced decoder network to generate the natural language answer for open-ended video question answering. We construct a large-scale long-form video question answering dataset. The extensive experiments show the effectiveness of our method.

Cite

CITATION STYLE

APA

Zhao, Z., Zhang, Z., Xiao, S., Yu, Z., Yu, J., Cai, D., … Zhuang, Y. (2018). Open-ended long-form video question answering via adaptive hierarchical reinforced networks. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 3683–3689). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/512

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free