In pathology, Immunohistochemical staining (IHC) of tissue sections is regularly used to diagnose and grade malignant tumors. Typically, IHC stain interpretation is rendered by a trained pathologist using a manual method, which consists of counting each positively- and negatively-stained cell under a microscope. The manual enumeration suffers from poor reproducibility even in the hands of expert pathologists. To facilitate this process, we propose a novel method to create artificial datasets with the known ground truth which allows us to analyze the recall, precision, accuracy, and intra- and inter-observer variability in a systematic manner, enabling us to compare different computer analysis approaches. Our method employs a conditional Generative Adversarial Network that uses a database of Ki67 stained tissues of breast cancer patients to generate synthetic digital slides. Our experiments show that synthetic images are indistinguishable from real images. Six readers (three pathologists and three image analysts) tried to differentiate 15 real from 15 synthetic images and the probability that the average reader would be able to correctly classify an image as synthetic or real more than 50% of the time was only 44.7%.
Senaras, C., Muhammad, K. K. N., Sahiner, B., Pennell, M. P., Tozbikian, G., Lozanski, G., & Gurcan, M. N. (2018). Optimized generation of high-resolution phantom images using cGAN: Application to quantification of Ki67 breast cancer images. PLoS ONE, 13(5). https://doi.org/10.1371/journal.pone.0196846