Abstract
In the contemporary era, image captioning has emerged as an indispensable tool.Furthermore, there are integrated applications designed to produce and descriptions for images, and these operations are reliant on DNN models.The procedure for creating a narrative for an image is referred to as image description.This process entails the identification of key objects and the relationships that exist among these objects within an image.The ultimate goal is to generate sentences that are both syntactically and semantically accurate and convert those audio into speech.In the following discourse, we introduce a learning model designed to expound upon object and craft description by leveraging the synergy of computer vision and machine translation.The objective of this paper is to identify various objects within an image, discern the connections and interactions among these objects, and subsequently formulate descriptive captions and those description into audio form.The dataset employed in this study is Flickr8k, and the primary programming language utilized is Python3.We will implement a machine learning technique known as Transfer Learning, utilizing the model to showcase the experiment.Additionally, this paper will delve into the functions and architecture of the various networks that are integral to the process.The generation of image captions represents a crucial component within the realms of Vision and NLP.Image description generators hold significant potential applications in the field of Image Segmentation, as exemplified by their use in platforms like Facebook and Google Photos.Furthermore, these tools can be extended to analyze video frames, offering substantial automation in interpreting visual content.Notably, their utility extends to aiding visually impaired individuals, representing a significant contribution to accessibility and inclusion.
Cite
CITATION STYLE
Dhule, C., Kuralkar, C., Ghaiwat, B., Pillewan, A., & Kakde, P. (2024). Image Caption Generator. In 15th International Conference on Advances in Computing, Control, and Telecommunication Technologies, ACT 2024 (Vol. 1, pp. 2773–2777). Grenze Scientific Society. https://doi.org/10.55041/ijsrem34949
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.