Image Transformers for Diabetic Retinopathy Detection from Fundus Datasets

2Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Diabetic retinopathy (DR) is the major cause for blindness worldwide. Early DR detection is crucial for preventing severe vision loss. Timely interventions improves patient outcomes, hence clinicians advice diabetic patients to undergo periodic retinal screening using fundus cameras, where DR can be identified through distinct retinal biomarkers like hemorrhages, aneurysms and exudates. These biomarkers can be detected using Deep learning techniques like Convolutional-Neural-Networks (CNN), Vision Transformers and Mixer architectures. The objective of this paper, is to develop, investigate and identify the architectures and algorithms that relatively improves the detection of DR. In this paper, 18 pre-trained state-of-the-art opensource models like ResNet, EfficientNet, BeiT, VOLO, TNT, DeiT, Visformer, CoAT-NET, CaiT, XCiT, Poolformer, Swin, Twin, PiT, MLP-Mixer, ResMLP, ConvMixer were used for DR detection. A custom classification-head containing Global Average Pooling layer, Fully Connected layers, Dropout, Activation functions and Softmax layer were added to pre-trained models. The entire architecture was fine-tuned, evaluated and benchmarked on multiple opensource fundus datasets using NVIDIA-GeForce-GTX-1080 GPU. Different hyperparameters like batch-sizes, normalization, dropout, activation, optimizers and learning-rate-decay functions were evaluated to improve the performance of the models. Overall, around 71 different experiments were conducted to achieve state-of-the-art F1-scores of 99.3%, 88.7%, 85.25%, 64.16%, 86.52% and 90.53% for APTOS-DR detection, APTOS-DR grading, Messidor, IDRiD-DR, IDRiD-AMD and AREDS datasets respectively, which was around 2% better than current state-of-the-art. Performance of Transformers was better than CNN and Mixer based architectures because of their ability to learn the global context and associate position of biomarkers with other anatomies of retina. F1-scores of Swin, PiT and Twin models were highest among all the Transformers because of their ability to encode fine as well as coarse-level details of biomarkers.

Cite

CITATION STYLE

APA

Kumar, N. S., Balasubramanian, R. K., & Phirke, M. R. (2023). Image Transformers for Diabetic Retinopathy Detection from Fundus Datasets. Revue d’Intelligence Artificielle, 37(6), 1617–1627. https://doi.org/10.18280/ria.370626

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free