Determining the clinical applicability of machine learning models through assessment of reporting across skin phototypes and rarer skin cancer types: A systematic review

7Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Machine learning (ML) models for skin cancer recognition may have variable performance across different skin phototypes and skin cancer types. Overall performance metrics alone are insufficient to detect poor subgroup performance. We aimed (1) to assess whether studies of ML models reported results separately for different skin phototypes and rarer skin cancers, and (2) to graphically represent the skin cancer training datasets used by current ML models. In this systematic review, we searched PubMed, Embase and CENTRAL. We included all studies in medical journals assessing an ML technique for skin cancer diagnosis that used clinical or dermoscopic images from 1 January 2012 to 22 September 2021. No language restrictions were applied. We considered rarer skin cancers to be skin cancers other than pigmented melanoma, basal cell carcinoma and squamous cell carcinoma. We identified 114 studies for inclusion. Rarer skin cancers were included by 8/114 studies (7.0%), and results for a rarer skin cancer were reported separately in 1/114 studies (0.9%). Performance was reported across all skin phototypes in 1/114 studies (0.9%), but performance was uncertain in skin phototypes I and VI from minimal representation of the skin phototypes in the test dataset (9/3756 and 1/3756, respectively). For training datasets, although public datasets were most frequently used, with the most widely used being the International Skin Imaging Collaboration (ISIC) archive (65/114 studies, 57.0%), the largest datasets were private. Our review identified that most ML models did not report performance separately for rarer skin cancers and different skin phototypes. A degree of variability in ML model performance across subgroups is expected, but the current lack of transparency is not justifiable and risks models being used inappropriately in populations in whom accuracy is low.

Cite

CITATION STYLE

APA

Steele, L., Tan, X. L., Olabi, B., Gao, J. M., Tanaka, R. J., & Williams, H. C. (2023, April 1). Determining the clinical applicability of machine learning models through assessment of reporting across skin phototypes and rarer skin cancer types: A systematic review. Journal of the European Academy of Dermatology and Venereology. John Wiley and Sons Inc. https://doi.org/10.1111/jdv.18814

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free