Comparing CNN-based and transformer-based models for identifying lung cancer: which is more effective?

4Citations
Citations of this article
23Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Lung cancer constitutes the most severe cause of cancer-related mortality. Recent evidence supports that early detection by means of computed tomography (CT) scans significantly reduces mortality rates. Given the remarkable progress of Vision Transformers (ViTs) in the field of computer vision, we have delved into comparing the performance of ViTs versus Convolutional Neural Networks (CNNs) for the automatic identification of lung cancer based on a dataset of 212 medical images. Importantly, neither ViTs nor CNNs require lung nodule annotations to predict the occurrence of cancer. To address the dataset limitations, we have trained both ViTs and CNNs with three advanced techniques: transfer learning, self-supervised learning, and sharpness-aware minimizer. Remarkably, we have found that CNNs achieve highly accurate prediction of a patient’s cancer status, with an outstanding recall (93.4%) and area under the Receiver Operating Characteristic curve (AUC) of 98.1%, when trained with self-supervised learning. Our study demonstrates that both CNNs and ViTs exhibit substantial potential with the three strategies. However, CNNs are more effective than ViTs with the insufficient quantities of dataset.

Cite

CITATION STYLE

APA

Gai, L., Xing, M., Chen, W., Zhang, Y., & Qiao, X. (2024). Comparing CNN-based and transformer-based models for identifying lung cancer: which is more effective? Multimedia Tools and Applications, 83(20), 59253–59269. https://doi.org/10.1007/s11042-023-17644-4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free