Abstract
The medical image captioning field is one of the prominent fields nowadays. The interpretation and captioning of medical images can be a time-consuming and costly process, often requiring expert support. The growing volume of medical images makes it challenging for radiologists to handle their workload alone. However, addressing the issues of high cost and time can be achieved by automating the process of medical image captioning while assisting radiologists in improving the reliability and accuracy of the generated captions. It also provides an opportunity for new radiologists with less experience to benefit from automated support. Despite previous efforts in automating medical image captioning, there are still some unresolved issues, including generating overly detailed captions, difficulty in identifying abnormal regions in complex images, and low accuracy and reliability of some generated captions. To tackle these challenges, we suggest the new deep learning model specifically tailored for captioning medical images. Our model aims to extract features from images and generate meaningful sentences related to the identified defects with high accuracy. The approach we present utilizes a multi-model neural network that closely mimics the human visual system and automatically learns to describe the content of images. Our proposed method consists of two stages. In the first stage, known as the information extraction phase, we employ the YOLOv4 model to extract medical image features efficiently which is then transformed into a feature vector. This phase focuses primarily on visual recognition using deep neural network techniques. The generated features are then fed into the second stage of caption generation, where the model produces grammatically correct natural language sentences describing the extracted features. The caption generation stage incorporates two sub-models: An object detection and localization model, which extracts information about objects present in the image and their spatial relationships, and a sophisticated deep Recurrent Neural Network (RNN), which utilizes Long Short-Term Memory (LSTM) units, enhanced by an attention mechanism, to generate sentences. This attention mechanism enables each word of the description to be aligned with different objects in the input image during generation. We evaluated our proposed model, using the PEIR dataset. Various Performance metrics including Rouge-L, Meteor score, and Bleu score were evaluated. Among these metrics, the BLEU score obtained using this model was 81.78%, while the METEOR score achieved was 78.56%. These results indicate that our model surpasses established benchmark models in terms of caption generation for medical images. This model was implemented using the Python Platform, making effective use of its capabilities and PEIR dataset. We compared its performance with recent existing models, demonstrating its superiority. The high BLEU and METEOR scores obtained highlight the effectiveness of our suggested model excels in producing precise and contextually rich descriptions for medical images. In summary, the model performs exceptionally well in this regard. Overall, the development of this model provides a promising solution to automate medical image captioning, addressing the challenges faced by radiologists in managing their workload and improving the precision and dependability of generated descriptions.
Author supplied keywords
Cite
CITATION STYLE
Ravinder, P., & Srinivasan, S. (2024). Automated Medical Image Captioning with Soft Attention-Based LSTM Model Utilizing YOLOv4 Algorithm. Journal of Computer Science, 20(1), 52–68. https://doi.org/10.3844/jcssp.2024.52.68
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.