HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Shantipriya Parida; Idris Abdulmumin; Shamsuddeen Hassan Muhammad; Aneesh Bose; Guneet Singh Kohli; Ibrahim Said Ahmad; Ketan Kotwal; Sayan Deb Sarkar; Ondřej Bojar; Habeebah Adamu Kakudi

Conference ProceedingsOPEN ACCESS

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2023) 10162-10183

DOI: 10.18653/v1/2023.findings-acl.646

1Citations

16Readers

Abstract

This paper presents HaVQA, the first multimodal dataset for visual question-answering (VQA) tasks in the Hausa language. The dataset was created by manually translating 6,022 English question-answer pairs, which are associated with 1,555 unique images from the Visual Genome dataset. As a result, the dataset provides 12,044 gold standard English-Hausa parallel sentences that were translated in a fashion that guarantees their semantic match with the corresponding visual information. We conducted several baseline experiments on the dataset, including visual question answering, visual question elicitation, text-only and multimodal machine translation.

References Powered by Scopus

View more at Scopus

Cite

CITATION STYLE

APA

Parida, S., Abdulmumin, I., Muhammad, S. H., Bose, A., Kohli, G. S., Ahmad, I. S., … Kakudi, H. A. (2023). HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 10162–10183). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.findings-acl.646

Readers' Seniority

PhD / Post grad / Masters / Doc 3

50%

Lecturer / Post doc 2

33%

Researcher 1

17%

Readers' Discipline

Computer Science 8

89%

Medicine and Dentistry 1

11%

HaVQA: A Dataset for Visual Question Answering and Multimodal Research in Hausa Language

Abstract

References Powered by Scopus

Deep residual learning for image recognition

Microsoft COCO: Common objects in context

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

Register to see more suggestions

Cite

Readers' Seniority

Readers' Discipline