A Novel Transformer Model With Multiple Instance Learning for Diabetic Retinopathy Classification

7Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Diabetic retinopathy (DR) is an irreversible fundus retinopathy. A deep learning-based automated DR diagnosis system can save diagnostic time. While Transformer has shown superior performance compared to Convolutional Neural Network (CNN), it typically requires pre-training with large amounts of data. Although Transformer-based DR diagnosis method may alleviate the problem of limited performance on small-scale retinal datasets by loading pre-trained weights, the size of input images is restricted to $224\times 224$. The resolution of retinal images captured by fundus cameras is much higher than $224\times 224$ , reducing resolution in training will result in the loss of valuable information. In order to efficiently utilize high-resolution retinal images, a new Transformer model with multiple instance learning (TMIL) is proposed for DR classification. A multiple instance learning approach is firstly applied on the retinal images to segment these high-resolution images into $224\times 224$ image patches. Subsequently, Vision Transformer (ViT) is used to extract features from each patch. Then, Global Instance Computing Block (GICB) is designed to calculate the inter-instance features. After introducing global information from GICB, the features are used to output the classification results. When using high-resolution retinal images, TMIL can load pre-trained weights of Transformer without being affected by weight interpolation on model performance. Experimental results using the APTOS dataset and the Messidor-1 dataset demonstrate that TMIL achieves better classification performance and reduces inference time by 62% compared with that directly inputting high-resolution images into ViT. And TMIL shows highest classification accuracy compared with the current state-of-the-art results. The code will publicly available at https://github.com/CNMaxYang/TMIL.

Cite

CITATION STYLE

APA

Yang, Y., Cai, Z., Qiu, S., & Xu, P. (2024). A Novel Transformer Model With Multiple Instance Learning for Diabetic Retinopathy Classification. IEEE Access, 12, 6768–6776. https://doi.org/10.1109/ACCESS.2024.3351473

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free