Most of the captions available for images are only present in a few languages prominent on the internet. The task of machine translation of image captions aims to democratize this information for other low resource languages through automatic translation. Compared to regular machine translation, image information can also be utilized to improve translated caption quality. The proposed work aims to demonstrate various deep learning techniques and approaches that can be used for optimal and efficient translation of captions from English to Hindi. Results show that transformer-based approaches outperform sequence to sequence approaches across all metrics (around 5–20% higher accuracy scores). Further, pre-trained transformer-based approaches are also able to resolve ambiguity very easily. Results also show that for such low resource scenarios, text only approaches are sufficient enough while multimodal approaches are unable to improve the translation quality. So, text only pre-trained transformers are recommended for most English-to-Hindi image caption translation applications.
CITATION STYLE
Bisht, P., & Solanki, A. (2022). Exploring Practical Deep Learning Approaches for English-to-Hindi Image Caption Translation Using Transformers and Object Detectors. In Lecture Notes in Electrical Engineering (Vol. 925, pp. 47–60). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-981-19-4831-2_5
Mendeley helps you to discover research relevant for your work.