Video Question Answering: Datasets, Algorithms and Challenges

Yaoyao Zhong; Junbin Xiao; Wei Ji; Yicong Li; Weihong Deng; Tat Seng Chua

Conference Proceedings

Video Question Answering: Datasets, Algorithms and Challenges

Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (2022) 6439-6455

DOI: 10.18653/v1/2022.emnlp-main.432

47Citations

67Readers

Get full text

Abstract

This survey aims to organize the recent advances in video question answering (VideoQA) and point towards future directions. We firstly categorize the datasets into: 1) normal VideoQA, multi-modal VideoQA and knowledge-based VideoQA, according to the modalities invoked in the question-answer pairs, and 2) factoid VideoQA and inference VideoQA, according to the technical challenges in comprehending the questions and deriving the correct answers. We then summarize the VideoQA techniques, including those mainly designed for Factoid QA (such as the early spatio-temporal attention-based methods and the recent Transformer-based ones) and those targeted at explicit relation and logic inference (such as neural modular networks, neural symbolic methods, and graph-structured methods). Aside from the backbone techniques, we also delve into specific models and derive some common and useful insights either for video modeling, question answering, or for cross-modal correspondence learning. Finally, we present the research trends of studying beyond factoid VideoQA to inference VideoQA, as well as towards the robustness and interpretability. Additionally, we maintain a repository, https://github.com/VRU-NExT/VideoQA, to keep trace of the latest VideoQA papers, datasets, and their open-source implementations if available. With these efforts, we strongly hope this survey could shed light on the follow-up VideoQA research.

Cite

CITATION STYLE

APA

Zhong, Y., Xiao, J., Ji, W., Li, Y., Deng, W., & Chua, T. S. (2022). Video Question Answering: Datasets, Algorithms and Challenges. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 (pp. 6439–6455). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.emnlp-main.432

Video Question Answering: Datasets, Algorithms and Challenges

Abstract

Cite

Register to see more suggestions