Genomic models of short-term exposure accurately predict long-term chemical carcinogenicity and identify putative mechanisms of action

58Citations
Citations of this article
82Readers
Mendeley users who have this article in their library.

Abstract

Background: Despite an overall decrease in incidence of and mortality from cancer, about 40% of Americans will be diagnosed with the disease in their lifetime, and around 20% will die of it. Current approaches to test carcinogenic chemicals adopt the 2-year rodent bioassay, which is costly and time-consuming. As a result, fewer than 2% of the chemicals on the market have actually been tested. However, evidence accumulated to date suggests that gene expression profiles from model organisms exposed to chemical compounds reflect underlying mechanisms of action, and that these toxicogenomic models could be used in the prediction of chemical carcinogenicity. Results: In this study, we used a rat-based microarray dataset from the NTP DrugMatrix Database to test the ability of toxicogenomics to model carcinogenicity. We analyzed 1,221 gene-expression profiles obtained from rats treated with 127 well-characterized compounds, including genotoxic and non-genotoxic carcinogens. We built a classifier that predicts a chemical's carcinogenic potential with an AUC of 0.78, and validated it on an independent dataset from the Japanese Toxicogenomics Project consisting of 2,065 profiles from 72 compounds. Finally, we identified differentially expressed genes associated with chemical carcinogenesis, and developed novel data-driven approaches for the molecular characterization of the response to chemical stressors. Conclusion: Here, we validate a toxicogenomic approach to predict carcinogenicity and provide strong evidence that, with a larger set of compounds, we should be able to improve the sensitivity and specificity of the predictions. We found that the prediction of carcinogenicity is tissue-dependent and that the results also confirm and expand upon previous studies implicating DNA damage, the peroxisome proliferator-activated receptor, the aryl hydrocarbon receptor, and regenerative pathology in the response to carcinogen exposure.

Figures

  • Figure 1. Principal component analysis (PCA) of the DrugMatrix. a) The first two principal components of all samples in the DrugMatrix dataset. b) Liver samples with color coding for controls, samples treated with genotoxic or non-genotoxic samples. c) Liver samples with color coding for carcinogenicity. doi:10.1371/journal.pone.0102579.g001
  • Figure 2. Defining the carcinogenome. a) Hierarchical clustering of 191 profiles/138 compounds (columns) and genes (rows), with each compound represented by the vector of ‘treatment vs. control’ differential expression t-scores. The heatmap is color-coded according to the significance level (q-values) of the corresponding t-scores. Notice the right cluster (top purple color bar) and its enrichment in carcinogenic (red) compounds (Fisher test p = 8.561026). b) Top 10 genes ranked according to the number of compounds inducing their significant up-/downregulation (FDR#0.01 and fold-change$1.5. See complete list in Table S28 in File S2). Each gene was also tested for its association with carcinogenicity across compounds (‘Enrichment’ columns) by performing a Fisher test between the gene status (0: not differentially expressed; 1: differentially expressed) and the compounds’ status (+= carcinogenic; 2= non-carcinogenic). c) Contingency table detailing the distribution of the genes whose compound-induced up-/down-regulation pattern is significantly associated with carcinogenicity status of the compounds. doi:10.1371/journal.pone.0102579.g002
  • Figure 3. Classification results overview. Random resampling classification results on the DrugMatrix (top) as well as the TG-GATEs (bottom) datasets using 200 iterations. In addition, the results of a model trained on all DrugMatrix samples and tested on TG-GATEs (middle) are shown. Results based on the regular gene expression data and on the data projected onto pathway space (canonical pathways of MSigDB – C2:CP, see Methods) are reported. For each testing scheme, area under the receiver operating characteristic (ROC) curve (AUC), as well as accuracy, sensitivity and specificity of a classifier trained with a zero-one loss function (FP:FN = 1:1), and 95% confidence intervals are reported. doi:10.1371/journal.pone.0102579.g003
  • Figure 4. ROC curve and variable importance for carcinogenicity prediction. ROC curve of random forest classification in liver of: a) genotoxicity and b) carcinogenicity. For carcinogenicity, tissue specific class labels from the carcinogenicity potency data base (CPDB) were used. The red curves show the mean of the 200 reruns, whereas the dashed curves indicate the first and third quartile respectively. The teal dot indicates a classifier assigning equal costs to false positives (FP) and false negatives (FN) (zero-one loss), whereas the blue dot indicates a classifier assigning a cost of 5 for FN and 1 for FP. c) Variable Importance of the random forest model. Blue denotes genes that are down-regulated in the carcinogenic group, whereas red denotes up-regulation. doi:10.1371/journal.pone.0102579.g004
  • Figure 5. Classification learning curves as a function of the number of chemicals for: a) genotoxicity and b) carcinogenicity in liver.
  • Table 1. AUC for different time points and doses in TG-GATEs.
  • Table 2. Validation of prediction using pathological items.
  • Figure 6. Putative Modes of Action of carcinogenic chemical compounds. a) Classification performance (AUC, averaged over 100 iterations of random resampling) of a random forest classifier as a function of the number of gene sets used as predictors. 150 gene sets are needed to reach maximum AUC, while 50 are sufficient to get 99% of the expected maximum AUC. b) Heatmaps of the top 50 pathways as ranked by their variable importance derived from a random forest classifier of hepato-carcinogenicity. Rows correspond to pathways, clustered into biological processes; columns correspond to chemical compounds. The left and right heatmaps show all non-carcinogenic and carcinogenic compounds, respectively. Only profiles corresponding to maximum duration and dose treatments, with replicates averaged, are displayed. A detailed version of the right heatmap with all pathways and compounds labeled is available in Figure S11. c) Details of the biological processes associated with the clustering, showing the single differentially regulated pathways and their variable importance ranking, as well as the driving genes. doi:10.1371/journal.pone.0102579.g006

References Powered by Scopus

Random forests

95157Citations
N/AReaders
Get full text

Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles

36136Citations
N/AReaders
Get full text

Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources

28761Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Toxicity testing in the 21st century: progress in the past decade and future perspectives

265Citations
N/AReaders
Get full text

High throughput toxicity screening and intracellular detection of nanomaterials

135Citations
N/AReaders
Get full text

Recent advances and emerging applications in text and data mining for biomedical discovery

126Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Gusenleitner, D., Auerbach, S. S., Melia, T., Gómez, H. F., Sherr, D. H., & Monti, S. (2014). Genomic models of short-term exposure accurately predict long-term chemical carcinogenicity and identify putative mechanisms of action. PLoS ONE, 9(7). https://doi.org/10.1371/journal.pone.0102579

Readers over time

‘14‘15‘16‘17‘18‘19‘20‘21‘22‘23‘24‘250481216

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 24

55%

Researcher 13

30%

Professor / Associate Prof. 6

14%

Lecturer / Post doc 1

2%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 20

49%

Pharmacology, Toxicology and Pharmaceut... 8

20%

Computer Science 7

17%

Medicine and Dentistry 6

15%

Article Metrics

Tooltip
Mentions
News Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free
0