Video question answering on screencast tutorials

3Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a new video question answering task on screencast tutorials. We introduce a dataset including question, answer and context triples from the tutorial videos for a software. Unlike other video question answering works, all the answers in our dataset are grounded to the domain knowledge base. An one-shot recognition algorithm is designed to extract the visual cues, which helps enhance the performance of video question answering. We also propose several baseline neural network architectures based on various aspects of video contexts from the dataset. The experimental results demonstrate that our proposed models significantly improve the question answering performances by incorporating multi-modal contexts and domain knowledge.

Cite

CITATION STYLE

APA

Zhao, W., Kim, S., Xu, N., & Jin, H. (2020). Video question answering on screencast tutorials. In IJCAI International Joint Conference on Artificial Intelligence (Vol. 2021-January, pp. 1061–1068). International Joint Conferences on Artificial Intelligence. https://doi.org/10.24963/ijcai.2020/148

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free