Initialized frame attention networks for video question answering

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Video Question Answering (Video QA) is one of the important and challenging problems in multimedia and computer vision research. In this paper, we propose a novel framework, called initialized frame attention networks (IFAN). This framework uses long short term memory (LSTM) networks to encode visual information of videos, then initializes the language model by the encoded features. Based on the visual and semantic features, we can get an appropriate answer. In particular, in this IFAN framework, we effectively integrate temporal attention mechanism to focus on the salient frames of videos, which are associated to the questions. In order to verify the effectiveness of the proposed framework, we conduct experiments on TACoS dataset. It achieves good performances on both hard level and easy level of TACoS dataset.

Cite

CITATION STYLE

APA

Gao, K., Zhu, X., & Han, Y. (2018). Initialized frame attention networks for video question answering. In Communications in Computer and Information Science (Vol. 819, pp. 349–359). Springer Verlag. https://doi.org/10.1007/978-981-10-8530-7_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free