Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

Kym I.E. Snell; Joie Ensor; Thomas P.A. Debray; Karel G.M. Moons; Richard D. Riley

Journal ArticleOPEN ACCESS

Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

Statistical Methods in Medical Research (2018) 27(11) 3505-3522

DOI: 10.1177/0962280217705678

83Citations

75Readers

Abstract

If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model’s discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of ‘true’ performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.

Author supplied keywords

Cite

CITATION STYLE

APA

Snell, K. I. E., Ensor, J., Debray, T. P. A., Moons, K. G. M., & Riley, R. D. (2018). Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures? Statistical Methods in Medical Research, 27(11), 3505–3522. https://doi.org/10.1177/0962280217705678

Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

Abstract

Author supplied keywords

Cite

Register to see more suggestions