Background: Machine learning models provide significant opportunities for improvement in health care, but their 'black-box' nature poses many risks. Methods: We built a custom Python module as part of a framework for generating artifacts that are meant to be tunable and describable to allow for future testing needs. We conducted an analysis of a previously published digital pathology classification model and an internally developed kidney tissue segmentation model, utilizing a variety of generated artifacts including testing their effects. The artifacts simulated were bubbles, tissue folds, uneven illumination, marker lines, uneven sectioning, altered staining, and tissue tears. Results: We found that there is some performance degradation on the tiles with artifacts, particularly with altered stains but also with marker lines, tissue folds, and uneven sectioning. We also found that the response of deep learning models to artifacts could be nonlinear. Conclusions: Generated artifacts can provide a useful tool for testing and building trust in machine learning models by understanding where these models might fail.
CITATION STYLE
Wang, N. C., Kaplan, J., Lee, J., Hodgin, J., Udager, A., & Rao, A. (2021). Stress testing pathology models with generated artifacts. Journal of Pathology Informatics, 12(1), 54. https://doi.org/10.4103/jpi.jpi_6_21
Mendeley helps you to discover research relevant for your work.