Statistical evaluation of prognostic versus diagnostic models: beyond the ROC curve.
BACKGROUND: Diagnostic and prognostic or predictive models serve different purposes. Whereas diagnostic models are usually used for classification, prognostic models incorporate the dimension of time, adding a stochastic element. CONTENT: The ROC curve is typically used to evaluate clinical utility for both diagnostic and prognostic models. This curve assesses how well a test or model discriminates, or separates individuals into two classes, such as diseased and nondiseased. A strong risk predictor, such as lipids for cardiovascular disease, may have limited impact on the area under the curve, called the AUC or c-statistic, even if it alters predicted values. Calibration, measuring whether predicted probabilities agree with observed proportions, is another component of model accuracy important to assess. Reclassification can directly compare the clinical impact of two models by determining how many individuals would be reclassified into clinically relevant risk strata. For example, adding high-sensitivity C-reactive protein and family history to prediction models for cardiovascular disease using traditional risk factors moves approximately 30% of those at intermediate risk levels, such as 5%-10% or 10%-20% 10-year risk, into higher or lower risk categories, despite little change in the c-statistic. A calibration statistic can asses how well the new predicted values agree with those observed in the cross-classified data. SUMMARY: Although it is useful for classification, evaluation of prognostic models should not rely solely on the ROC curve, but should assess both discrimination and calibration. Risk reclassification can aid in comparing the clinical impact of two models on risk for the individual, as well as the population.