Large-Scale Annotation of Histopathology Images from Social Media

Andrew J. Schaumberg; Wendy Juarez; Sarah J. Choudhury; Laura G. Pastrian; Bobbi S. Pritt; Mario Prieto Pozuelo; Ricardo Sotillo Sanchez; Khanh Ho; Nusrat Zahra; Betul Duygu Sener; Stephen Yip; Bin Xu; Srinivas Rao Annavarapu; Aurelien Morini; Karra A. Jones; Kathia Rosado-Orozco; S. Joseph Sirintrapun; Mariam Aly; Thomas J. Fuchs

Book

Large-Scale Annotation of Histopathology Images from Social Media

Schaumberg A
Juarez W
Choudhury S
et al.

(2018), 396663

N/ACitations

16Readers

Abstract

Large-scale annotated image datasets like ImageNet and CIFAR-10 have been essential in developing and testing sophisticated new machine learning algorithms for natural vision tasks. Such datasets allow the development of neural networks to make visual discriminations that are done by humans in everyday activities, e.g. discriminating classes of vehicles. An emerging field -- computational pathology -- applies such machine learning algorithms to the highly specialized vision task of diagnosing cancer or other diseases from pathology images. Importantly, labeling pathology images requires pathologists who have had decades of training, but due to the demands on pathologists' time (e.g. clinical service) obtaining a large annotated dataset of pathology images for supervised learning is difficult. To facilitate advances in computational pathology, on a scale similar to advances obtained in natural vision tasks using ImageNet, we leverage the power of social media. Pathologists worldwide share annotated pathology images on Twitter, which together provide thousands of diverse pathology images spanning many sub-disciplines. From Twitter, we assembled a dataset of 2,746 images from 1,576 tweets from 13 pathologists from 8 countries; each message includes both images and text commentary. To demonstrate the utility of these data for computational pathology, we apply machine learning to our new dataset to test whether we can accurately identify different stains and discriminate between different tissues. Using a Random Forest, we report (i) 0.959 +- 0.013 Area Under Receiver Operating Characteristic [AUROC] when identifying single-panel human hematoxylin and eosin [H&E] stained slides that are not overdrawn and (ii) 0.996 +- 0.004 AUROC when distinguishing H&E from immunohistochemistry [IHC] stained microscopy images. Moreover, we distinguish all pairs of breast, dermatological, gastrointestinal, genitourinary, and gynecological [gyn] pathology tissue types, with mean AUROC for any pairwise comparison ranging from 0.771 to 0.879. This range is 0.815 to 0.879 if gyn is excluded. We report 0.815 +- 0.054 AUROC when all five tissue types are considered in a single multiclass classification task. Our goal is to make this large-scale annotated dataset publicly available for researchers worldwide to develop, test, and compare their machine learning methods, an important step to advancing the field of computational pathology.

Cite

CITATION STYLE

APA

Schaumberg, A. J., Juarez, W., Choudhury, S. J., Pastrian, L. G., Pritt, B. S., Pozuelo, M. P., … Fuchs, T. J. (2018). Large-Scale Annotation of Histopathology Images from Social Media. bioRxiv (p. 396663). Retrieved from https://www.biorxiv.org/content/early/2018/08/21/396663

Large-Scale Annotation of Histopathology Images from Social Media

Abstract

Cite

Register to see more suggestions