Abstract
T HE Cartesian premise that 'if something can't be quantified, its existence should be questioned', is implicit in much of modern medicine. Many of us are attracted to quan-tification, and in a world dominated by data, the idea of numeric certainty holds allure. Scoring systems for the purpose of appraising severity and outcome have thus been developed for the critically ill. 1 While accurate and useful across groups of patients, their presence tempts clinicians to suppose that individual patients with good scores will do well, and individual patients with bad scores won't. Attractive as this may seem, it is wrong. In the current issue of The Journal, Erhmann and colleagues examine a scoring system's performance in a particularly thorny area of critical care-establishing futility, and withdrawing life support. 2 The authors examined 93 intensive care unit (ICU) patients and determined survival as a function of the change in logistic organ dysfunction (LOD) score between admission and day four. 2 They 2 hoped that the trajectory taken by each individual's score would better map out their fate than a single measurement. In short, those who developed progressively better scores would do well and likely survive; conversely, those developing progressively worse scores would do badly and likely die. It was this latter pattern that was thought to select out those for whom limitation of care might have been appropriate. This paper 2 revisits the use of a severity of illness score to limit care, an idea initially investigated by Atkinson et al. in a very controversial manuscript just over a decade ago. 3 That paper reported that by and large, patients predicted to die died, and in doing so over a prolonged ICU stay, consumed a great deal of health-care resources. 3 However, not all of those predicted to die died-some lived with a good quality of life. Far from attempting to save on the financial costs by numbers-driven limitation of care, the paper focused on the human costs of getting it wrong. 3 Thus, the current publication 2 serves as a timely reminder of three generic issues related to the application of scoring systems: use, misuse, and abuse. How have scoring systems been put to (good) use? Knaus et al.'s benchmark study 4 used APACHE scores to compare actual and predicted death rates using group results as the standard. The authors then constructed a ranking of institutional outcomes to demonstrate the superiority of some centres (and their characteristics) and, naturally enough, the inferiority of others. This grading disconcertingly resembles the ranking of the country's universities by a popular Canadian magazine, 5 or the dubious designation of 'best doctor' by others. The assumption is that comparable severity of illness scores equate to comparable illness. Over large groups of patients with similar disease, this is true. If the groups are small or the populations dissimilar, then all bets are off, and validity far less likely. The flaws that are inherent in scoring systems limit the applicability of such scoring systems to quality assurance (between institutions or over time in a single centre). Similar limitations apply to the use of such ratings within clinical trials. Here, scoring systems serve three purposes: to screen trial entrants, as a 'surrogate' measure of outcome, or when implementing the results of the trial in the care of patients. Good scoring systems provide reliable data for the probability of survival of a group of patients. However, a banal example illustrates the differences by probabilities for groups and probabilities for individuals. Suppose the weights of 1,000 consecutive patients are entered into a database and 90% of the patients weigh less than 100 kg: thus approximately one patient in ten will weigh 100 kg (or greater). There is no indication-without additional information-which of the individual patients would weigh this much. No matter how 432 EDITORIAL CAN J ANESTH 2006 / 53: 5 / pp 432-436 EDITORIAL 433 such data were processed, there is no predictor for the actual weight of a patient; only the probability of their weight is available. The clinician may be bemused or perplexed with this analogy-he or she would simply weigh the patient, and have no need for a population-based scoring system. This simplicity is instructive: when the answer is available directly from the patient, the clinician should not look to a complex statistic. The analogy with ICU scoring systems is carried further by studies documenting that providing clinicians with objective, model-based estimates of probability of survival does not influence mortality. 6,7 Finally, there are pressing biological reasons as to why we should be skeptical towards reliance on scoring systems. For example, the responses-and outcomes-of individual patients with sepsis may well be far more dependent on their particular genetic makeup (e.g., cytokine poly-morphisms) 8 than on derived prognostic parameters. Beyond the issues of use and misuse, there lies the specter of abuse. Abuse, corresponding to bad treatment or exploitation, is especially worrisome in the context of the vulnerable critically ill patient. A most disturbing possibility is that a scoring system might be used alone to justify the withdrawal of active care, or institution of palliation. We believe that four lines of reasoning explain why we should reject the notion of limiting care based on a scoring system. First, as discussed above, predic-tors derived from populations (as in this study) will never be robust indicators of the fate of individuals. Although some systems perform well 9,10 and despite well-demonstrated weaknesses in physicians' ability predict outcome in critical illness, 11,12 Canadian inten-sivists predict individual patient mortality better (more than sixfold) than either of two well validated scoring systems. 13 Probabilities, unless very close to 100% or to 0% (and then, only if highly accurate), simply have quite limited applicability to individual patients. Second, straddling the gap between the statistical and the actual is the uneasy notion of futility. Clinicians often invoke futility, but their numerical definition of it varies 14 as does their ability to predict it. In a much discussed study by the Canadian Critical Care Trials Group, actual ICU survival rates were higher than clinician predicted survival across all ranges of survival predictions; 13 of patients whom physicians attributed a less than 10% chance of survival, 29% left the ICU and 22% returned home. 13 Indeed 3.6% of patients in whom mechanical ventilation was withdrawn-not weaned-survived to discharge from hospital. 15 So much for futility.
Cite
CITATION STYLE
Skrobik, Y., & Kavanagh, B. P. (2006). Scoring systems for the critically ill:use, misuse and abuse. Canadian Journal of Anesthesia/Journal Canadien d’anesthésie, 53(5), 432–436. https://doi.org/10.1007/bf03022613
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.