Large-Scale Annotation of Histopathology Images from Social Media

  • Schaumberg A
  • Juarez W
  • Choudhury S
  • et al.
N/ACitations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Large-scale annotated image datasets like ImageNet and CIFAR-10 have been essential in developing and testing sophisticated new machine learning algorithms for natural vision tasks. Such datasets allow the development of neural networks to make visual discriminations that are done by humans in everyday activities, e.g. discriminating classes of vehicles. An emerging field -- computational pathology -- applies such machine learning algorithms to the highly specialized vision task of diagnosing cancer or other diseases from pathology images. Importantly, labeling pathology images requires pathologists who have had decades of training, but due to the demands on pathologists' time (e.g. clinical service) obtaining a large annotated dataset of pathology images for supervised learning is difficult. To facilitate advances in computational pathology, on a scale similar to advances obtained in natural vision tasks using ImageNet, we leverage the power of social media. Pathologists worldwide share annotated pathology images on Twitter, which together provide thousands of diverse pathology images spanning many sub-disciplines. From Twitter, we assembled a dataset of 2,746 images from 1,576 tweets from 13 pathologists from 8 countries; each message includes both images and text commentary. To demonstrate the utility of these data for computational pathology, we apply machine learning to our new dataset to test whether we can accurately identify different stains and discriminate between different tissues. Using a Random Forest, we report (i) 0.959 +- 0.013 Area Under Receiver Operating Characteristic [AUROC] when identifying single-panel human hematoxylin and eosin [H&E] stained slides that are not overdrawn and (ii) 0.996 +- 0.004 AUROC when distinguishing H&E from immunohistochemistry [IHC] stained microscopy images. Moreover, we distinguish all pairs of breast, dermatological, gastrointestinal, genitourinary, and gynecological [gyn] pathology tissue types, with mean AUROC for any pairwise comparison ranging from 0.771 to 0.879. This range is 0.815 to 0.879 if gyn is excluded. We report 0.815 +- 0.054 AUROC when all five tissue types are considered in a single multiclass classification task. Our goal is to make this large-scale annotated dataset publicly available for researchers worldwide to develop, test, and compare their machine learning methods, an important step to advancing the field of computational pathology.

Cite

CITATION STYLE

APA

Schaumberg, A. J., Juarez, W., Choudhury, S. J., Pastrian, L. G., Pritt, B. S., Pozuelo, M. P., … Fuchs, T. J. (2018). Large-Scale Annotation of Histopathology Images from Social Media. bioRxiv (p. 396663). Retrieved from https://www.biorxiv.org/content/early/2018/08/21/396663

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free