Initialized frame attention networks for video question answering

Kun Gao; Xianglei Zhu; Yahong Han

Conference Proceedings

Initialized frame attention networks for video question answering

Communications in Computer and Information Science (2018) 819 349-359

DOI: 10.1007/978-981-10-8530-7_34

1Citations

3Readers

Get full text

Abstract

Video Question Answering (Video QA) is one of the important and challenging problems in multimedia and computer vision research. In this paper, we propose a novel framework, called initialized frame attention networks (IFAN). This framework uses long short term memory (LSTM) networks to encode visual information of videos, then initializes the language model by the encoded features. Based on the visual and semantic features, we can get an appropriate answer. In particular, in this IFAN framework, we effectively integrate temporal attention mechanism to focus on the salient frames of videos, which are associated to the questions. In order to verify the effectiveness of the proposed framework, we conduct experiments on TACoS dataset. It achieves good performances on both hard level and easy level of TACoS dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Gao, K., Zhu, X., & Han, Y. (2018). Initialized frame attention networks for video question answering. In Communications in Computer and Information Science (Vol. 819, pp. 349–359). Springer Verlag. https://doi.org/10.1007/978-981-10-8530-7_34

Initialized frame attention networks for video question answering

Abstract

Author supplied keywords

Cite

Register to see more suggestions