Arabic Image Captioning using Pre-training of Deep Bidirectional Transformers

12Citations
Citations of this article
36Readers
Mendeley users who have this article in their library.

Abstract

Image captioning is the process of automatically generating a textual description of an image. It has a wide range of applications, such as effective image search, auto archiving and even helping visually impaired people to see. English image captioning has seen a lot of development lately, while Arabic image captioning is lagging behind. In this work, we developed and evaluated several Arabic image captioning models with well-established metrics on a public image captioning benchmark. We initialized all models with transformers pre-trained on different Arabic corpora. After initialization, we fine-tuned them with image-caption pairs using a learning method called OSCAR. OSCAR uses object tags detected in images as anchor points to significantly ease the learning of image-text semantic alignments. In relation to the image captioning benchmark, our best performing model scored 0.39, 0.25, 0.15 and 0.092 with BLEU-1,2,3,4 respectively1, an improvement over previously published scores of 0.33, 0.19, 0.11 and 0.057. Beside additional evaluation metrics, we complemented our scores with human evaluation on a sample of our output. Our experiments showed that training image captioning models with Arabic captions and English object tags is a working approach, but that a pure Arabic dataset, with Arabic object tags, would be preferable.

Cite

CITATION STYLE

APA

Emami, J., Nugues, P., Elnagar, A., & Afyouni, I. (2022). Arabic Image Captioning using Pre-training of Deep Bidirectional Transformers. In 15th International Natural Language Generation Conference, INLG 2022 (pp. 40–51). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.inlg-main.4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free