Open-ended long-form video question answering via adaptive hierarchical reinforced networks

Zhou Zhao; Zhu Zhang; Shuwen Xiao; Zhou Yu; Jun Yu; Deng Cai; Fei Wu; Yueting Zhuang

Conference Proceedings

Open-ended long-form video question answering via adaptive hierarchical reinforced networks

IJCAI International Joint Conference on Artificial Intelligence (2018) 2018-July 3683-3689

DOI: 10.24963/ijcai.2018/512

48Citations

35Readers

Get full text

Abstract

Open-ended long-form video question answering is challenging problem in visual information retrieval, which automatically generates the natural language answer from the referenced long-form video content according to the question. However, the existing video question answering works mainly focus on the short-form video question answering, due to the lack of modeling the semantic representation of long-form video contents. In this paper, we consider the problem of long-form video question answering from the viewpoint of adaptive hierarchical reinforced encoder-decoder network learning. We propose the adaptive hierarchical encoder network to learn the joint representation of the longform video contents according to the question with adaptive video segmentation. we then develop the reinforced decoder network to generate the natural language answer for open-ended video question answering. We construct a large-scale long-form video question answering dataset. The extensive experiments show the effectiveness of our method.

Cite

CITATION STYLE

APA

Zhao, Z., Zhang, Z., Xiao, S., Yu, Z., Yu, J., Cai, D., … Zhuang, Y. (2018). Open-ended long-form video question answering via adaptive hierarchical reinforced networks. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2018-July, pp. 3683–3689). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2018/512

Open-ended long-form video question answering via adaptive hierarchical reinforced networks

Abstract

Cite

Register to see more suggestions