Curriculum Learning for Compositional Visual Reasoning

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Visual Question Answering (VQA) is a complex task requiring large datasets and expensive training. Neural Module Networks (NMN) first translate the question to a reasoning path, then follow that path to analyze the image and provide an answer. We propose an NMN method that relies on predefined cross-modal embeddings to “warm start” learning on the GQA dataset, then focus on Curriculum Learning (CL) as a way to improve training and make a better use of the data. Several difficulty criteria are employed for defining CL methods. We show that by an appropriate selection of the CL method the cost of training and the amount of training data can be greatly reduced, with a limited impact on the final VQA accuracy. Furthermore, we introduce intermediate losses during training and find that this allows to simplify the CL strategy.

Cite

CITATION STYLE

APA

Aissa, W., Ferecatu, M., & Crucianu, M. (2023). Curriculum Learning for Compositional Visual Reasoning. In Proceedings of the International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (Vol. 5, pp. 888–897). Science and Technology Publications, Lda. https://doi.org/10.5220/0011895400003417

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free