Toward evidence-based medical sta...
Toward Evidence-Based Medical Statistics. 2: The Bayes Factor Steven N. Goodman, MD, PhD Bayesian inference is usually presented as a method for determining how scientific belief should be modified by data. Although Bayesian methodology has been one of the most active areas of statistical development in the past 20 years, medical researchers have been reluctant to em- brace what they perceive as a subjective approach to data analysis. It is little understood that Bayesian methods have a data-based core, which can be used as a calculus of evidence. This core is the Bayes factor, which in its simplest form is also called a likelihood ratio. The minimum Bayes factor is objective and can be used in lieu of the P value as a measure of the evidential strength. Unlike P values, Bayes factors have a sound theoretical foundation and an interpretation that allows their use in both inference and decision making. Bayes factors show that P values greatly overstate the evidence against the null hypothesis. Most important, Bayes factors require the addition of background knowledge to be transformed into inferences���probabilities that a given conclusion is right or wrong. They make the distinction clear between experimental evidence and infer- ential conclusions while providing a framework in which to combine prior with current evidence. This paper is also available at http://www.acponline.org. Ann Intern Med. 1999 130:1005-1013. From Johns Hopkins University School of Medicine, Baltimore, Maryland. For the current author address, see end of text. Itistics n the first of two articles on evidence-based sta- (1), I outlined the inherent difficulties of the standard frequentist statistical approach to in- ference: problems with using the P value as a mea- sure of evidence, internal inconsistencies of the com- bined hypothesis test���P value method, and how that method inhibits combining experimental results with background information. Here, I explore, as non- mathematically as possible, the Bayesian approach to measuring evidence and combining information and epistemologic uncertainties that affect all statis- tical approaches to inference. Some of this presen- tation may be new to clinical researchers, but most of it is based on ideas that have existed at least since the 1920s and, to some extent, centuries earlier (2). The Bayes Factor Alternative Bayesian inference is often described as a method of showing how belief is altered by data. Because of this, many researchers regard it as non- scientific that is, they want to know what the data say, not what our belief should be after observing them (3). Comments such as the following, which ap- peared in response to an article proposing a Bayesian analysis of the GUSTO (Global Utilization of Strep- tokinase and tPA for Occluded Coronary Arteries) trial (4), are typical. When modern Bayesians include a ���prior probability distribution for the belief in the truth of a hypothesis,��� they are actually creating a metaphysical model of attitude change . . . The result . . . cannot be field-tested for its validity, other than that it ���feels��� reasonable to the consumer. . . . The real problem is that neither classical nor Bayesian methods are able to provide the kind of answers cli- nicians want. That classical methods are flawed is un- deniable���I wish I had an alternative . . . . (5) This comment reflects the widespread mispercep- tion that the only utility of the Bayesian approach is as a belief calculus. What is not appreciated is that Bayesian methods can instead be viewed as an evi- dential calculus. Bayes theorem has two compo- nents���one that summarizes the data and one that represents belief. Here, I focus on the component related to the data: the Bayes factor, which in its simplest form is also called a likelihood ratio. In Bayes theorem, the Bayes factor is the index through which the data speak, and it is separate from the purely subjective part of the equation. It has also been called the relative betting odds, and its logarithm is some- times referred to as the weight of the evidence (6, 7). The distinction between evidence and error is clear when it is recognized that the Bayes factor (evidence) is a measure of how much the probability of truth (that is, 1 2 prob(error), where prob is probability) is altered by the data. The equation is as follows: Prior Odds of Null Hypothesis 3 Bayes Factor 5 Posterior Odds of Null Hypothesis where Bayes factor 5 Prob~Data, given the null hypothesis! Prob~Data, given the alternative hypothesis! The Bayes factor is a comparison of how well two hypotheses predict the data. The hypothesis that predicts the observed data better is the one that is said to have more evidence supporting it. Unlike the P value, the Bayes factor has a sound theoretical foundation and an interpretation that See related article on pp 995-1004 and editorial comment on pp 1019-1021. ��1999 American College of Physicians���American Society of Internal Medicine 1005
allows it to be used in both inference and decision making. It links notions of objective probability, ev- idence, and subjective probability into a coherent package and is interpretable from all three perspec- tives. For example, if the Bayes factor for the null hypothesis compared with another hypothesis is 1/2, the meaning can be expressed in three ways. 1. Objective probability: The observed results are half as probable under the null hypothesis as they are under the alternative. 2. Inductive evidence: The evidence supports the null hypothesis half as strongly as it does the alter- native. 3. Subjective probability: The odds of the null hypothesis relative to the alternative hypothesis af- ter the experiment are half what they were before the experiment. The Bayes factor differs in many ways from a P value. First, the Bayes factor is not a probability itself but a ratio of probabilities, and it can vary from zero to infinity. It requires two hypotheses, making it clear that for evidence to be against the null hypothesis, it must be for some alternative. Second, the Bayes factor depends on the probability of the observed data alone, not including unobserved ���long run��� results that are part of the P value calcu- lation. Thus, factors unrelated to the data that affect the P value, such as why an experiment was stopped, do not affect the Bayes factor (8, 9). Because we are so accustomed to thinking of ���evidence��� and the probability of ���error��� as synon- ymous, it may be difficult to know how to deal with a measure of evidence that is not a probability. It is helpful to think of it as analogous to the concept of energy. We know that energy is real, but because it is not directly observable, we infer the meaning of a given amount from how much it heats water, lifts a weight, lights a city, or cools a house. We begin to understand what ���a lot��� and ���a little��� mean through its effects. So it is with the Bayes factor: It modifies prior probabilities, and after seeing how much Bayes factors of certain sizes change various prior probabilities, we begin to understand what repre- sents strong evidence, and weak evidence. Table 1 shows us how far various Bayes factors move prior probabilities, on the null hypothesis, of 90%, 50%, and 25%. These correspond, respective- ly, to high initial confidence in the null hypothesis, equivocal confidence, and moderate suspicion that the null hypothesis is not true. If one is highly con- vinced of no effect (90% prior probability of the null hypothesis) before starting the experiment, a Bayes factor of 1/10 will move one to being equiv- ocal (47% probability on the null hypothesis), but if one is equivocal at the start (50% prior probability), that same amount of evidence will be moderately con- vincing that the null hypothesis is not true (9% pos- terior probability). A Bayes factor of 1/100 is strong enough to move one from being 90% sure of the null hypothesis to being only 8% sure. As the strength of the evidence increases, the data are more able to convert a skeptic into a believer or a tentative suggestion into an accepted truth. This means that as the experimental evidence gets stronger, the amount of external evidence needed to support a scientific claim decreases. Con- versely, when there is little outside evidence sup- porting a claim, much stronger experimental evidence is required for it to be credible. This phenomenon can be observed empirically, in the medical community���s reluctance to accept the results of clinical trials that run counter to strong prior beliefs (10, 11). Bayes Factors and Meta-Analysis There are two dimensions to the ���evidence-based��� properties of Bayes factors. One is that they are a proper measure of quantitative evidence this issue will be further explored shortly. The other is that they allow us to combine evidence from different experiments in a natural and intuitive way. To under- stand this, we must understand a little more of the theory underlying Bayes factors (12���14). Every hypothesis under which the observed data are not impossible can be said to have some evi- dence for it. The strength of this evidence is pro- portional to the probability of the data under that hypothesis and is called the likelihood of the hypoth- esis. This use of the term ���likelihood��� must not be confused with its common language meaning of Table 1. Final (Posterior) Probability of the Null Hypothesis after Observing Various Bayes Factors, as a Function of the Prior Probability of the Null Hypothesis Strength of Evidence Bayes Factor Decrease in Probability of the Null Hypothesis From To No Less Than % Weak 1/5 90 64* 50 17 25 6 Moderate 1/10 90 47 50 9 25 3 Moderate to strong 1/20 90 31 50 5 25 2 Strong to very strong 1/100 90 8 50 1 25 0.3 * Calculations were performed as follows: A probability (Prob) of 90% is equivalent to an odds of 9, calculated as Prob/(1 2 Prob). Posterior odds 5 Bayes factor 3 prior odds thus, (1/5) 3 9 5 1.8. Probability 5 odds/(1 1 odds) thus, 1.8/2.8 5 0.64. 1006 15 June 1999 ��� Annals of Internal Medicine ��� Volume 130 ��� Number 12