Background: Skin lesion classification through dermatoscopic images is the most common method for non-invasive diagnostics of dermatologic conditions. Feature extraction through deep learning (DL) based convolutional neural networks (CNNs) provides insight into differential attributes of skin lesions that may pertain to its malignancy. In this study, we sought to improve the performance of standard CNN architectures in skin lesion classification by providing a machine learning (ML)-derived risk score from patient demographic data. Methods: We isolated 1,340 patients (n=2,200) from the HAM10000 dataset with ground-truth diagnoses of either melanoma or benign keratosis-like lesions. Images were split into train, validation, and test, with equal representation of each class in each phase. Baseline CNN performance was established by training 5 DL network architectures (Ni) with 3-fold cross-validation (CV); each of which employed leave-one-out CV and an early stopping criterion. Learning rate (LR) and weight decay (WD) were optimized to yield networks with the highest area under the receiver operating characteristic curve (AUC). For ML training, one-hot encoding was applied to demographic variables (age, sex, localization of lesion). This risk score was added as an additional feature in the final convolutional layers while training CNNs, yielding deep ensemble networks (Ei); all optimized parameters were the same as Ni. Results: Amongst 7 ML classifiers, the random forest algorithm (MRF) yielded the highest test AUC of 0.710. No significant difference was observed in test AUCs across DL networks (Ni =0.81±0.04) and ensemble networks (Ei =0.88±0.03), demonstrating network architecture did not significantly influence performance. A statistically significant increase in AUCs was observed in Ei compared to Ni (P=4.23E−3), indicating a significant contribution with the inclusion of a demographic risk score. Furthermore, activation maps generated for network visualization of test set images show higher specificity of differential features to inform network prediction in Ei. Average predictions on Dholdout are significantly closer to true values in Ei compared to Ni. Conclusions: The ensemble inclusion of a ML risk stratifier from demographic data may improve DL binary classification of dermatoscopic lesions.
CITATION STYLE
Roge, A., Ting, P., Chern, A., & Ting, W. (2023). Deep ensemble learning using a demographic machine learning risk stratifier for binary classification of skin lesions using dermatoscopic images. Journal of Medical Artificial Intelligence, 6. https://doi.org/10.21037/jmai-23-38
Mendeley helps you to discover research relevant for your work.