Exploring Practical Deep Learning Approaches for English-to-Hindi Image Caption Translation Using Transformers and Object Detectors

3Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Most of the captions available for images are only present in a few languages prominent on the internet. The task of machine translation of image captions aims to democratize this information for other low resource languages through automatic translation. Compared to regular machine translation, image information can also be utilized to improve translated caption quality. The proposed work aims to demonstrate various deep learning techniques and approaches that can be used for optimal and efficient translation of captions from English to Hindi. Results show that transformer-based approaches outperform sequence to sequence approaches across all metrics (around 5–20% higher accuracy scores). Further, pre-trained transformer-based approaches are also able to resolve ambiguity very easily. Results also show that for such low resource scenarios, text only approaches are sufficient enough while multimodal approaches are unable to improve the translation quality. So, text only pre-trained transformers are recommended for most English-to-Hindi image caption translation applications.

Cite

CITATION STYLE

APA

Bisht, P., & Solanki, A. (2022). Exploring Practical Deep Learning Approaches for English-to-Hindi Image Caption Translation Using Transformers and Object Detectors. In Lecture Notes in Electrical Engineering (Vol. 925, pp. 47–60). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-4831-2_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free