# Second Thoughts on the Bootstrap

by
Statistical Science ()
or

#### Abstract

This brief review article is appearing in the issue of Statistical Science that marks the 25th anniversary of the bootstrap. It concerns some of the theoretical and methodological aspects of the bootstrap and how they might influence future work in statistics.

#### Cite this document (BETA)

Available from projecteuclid.org
Page 1

# Second Thoughts on the Bootstrap -

Statistical Science 2003, Vol. 18, No. 2, 135���140 �� Institute of Mathematical Statistics, 2003 Second Thoughts on the Bootstrap Bradley Efron Abstract. This brief review article is appearing in the issue of Statistical Science that marks the 25th anniversary of the bootstrap. It concerns some of the theoretical and methodological aspects of the bootstrap and how they might influence future work in statistics. Key words and phrases: Plug-in principle, bootstrap confidence intervals, objective Bayes, BCA, ABC method. My first thoughts on the bootstrap centered around variance and bias estimation. This was natural enough given the bootstrap���s roots in the jackknife literature, with Quenouille (1949) on bias and Tukey (1958) on variance setting the agenda. The oldest note I can find says simply ���What is the jackknife an approximation to?��� Poor English, but a good question that resulted in the 1977 Rietz Lecture, ���Bootstrap Methods: Another Look at the Jackknife��� (Efron, 1979). Jaeckel���s (1972) Bell Labs memorandum on the infinitesimal jackknife was particularly helpful in answering the approxima- tion question. Now it is 25 years later and the bootstrap baby is old enough to be in grad school. I have had some second thoughts about the bootstrap���its strengths and weaknesses, its foundations, what it can and cannot do, what it might do in the future���and these second thoughts are what I will talk about, briefly, here. This volume is full of excellent essays that discuss and sometimes answer many of these questions in the context of authentic applications. So with apologies to the authors and the readers for any redundancy, here are a few comments and concerns. THE PLUG-IN PRINCIPLE The diagram in Figure 1 describes a typical bootstrap application: An unknown probability model P , for ex- ample, a logistic regression that depends on an un- known vector of coefficients, has yielded an observed data vector x. From x we calculate a statistic �� = s(x) Bradley Efron is Professor of Statistics and Bio- statistics and Max H. Stein Professor of Humani- ties and Sciences, Department of Statistics, Stanford University, Stanford, California 94305-4065 (e-mail: brad@stat.stanford.edu). intended to estimate a parameter �� = t (P ) of particular importance, perhaps one of the unknown coefficients. We are interested in �� ���s accuracy for estimating �� , with accuracy defined in terms of bias, variance, confidence intervals, prediction error or some other such measure. The right half of the diagram describes the ���boot- strap world��� (in David Freedman���s picturesque termi- nology): P is a point estimate of P , in the logistic regression example obtained perhaps by substituting maximum likelihood estimates for the unknown coeffi- cients. The estimate P yields bootstrap data vectors x��� and then bootstrap replications �� ��� = s(x���). Since P is completely known, we can generate as many �� ������s as we want, or have time for, and use their observed variabil- ity to assess the accuracy of �� . During the past 25 years an enormous amount of statistical research has investi- gated the validity of the bootstrap approach. For most models P and most statistics �� , we know that the boot- strap standard deviation sd���{�� ���} is a good estimator for the true standard deviation sd{�� }, and likewise for other accuracy measures. The double arrow in Figure 1 indicates the estima- tion of P from x. The utility of the bootstrap depends on the double arrow process being easy to execute. It is particularly easy in the one-sample nonparametric case, where a completely unknown probability distrib- ution gives x = (x1, x2, . . . , xn) by random sampling, in which case we can take P to be the empirical dis- tribution that puts probability 1/n on each xi . Simply stated, the bootstrap is a device for upgrading a point estimate for P to an accuracy estimate for �� . Point es- timates P are so ubiquitous it comes as a shock when, as in some versions of the proportional hazards model, point estimates do not exist. Figure 1 exemplifies the plug-in principle: We travel from the real world to the bootstrap world simply by 135
Page 2
136 B. EFRON FIG. 1. Typical bootstrap diagram. Unknown probability model P gives observed data x and we wish to know the accuracy of statistic �� = s(x) for estimating the parameter of interest �� = t (P ). Point estimate P for P yields bootstrap data sets x���. Accuracy is inferred from observed variability of bootstrap replications �� ��� = s(x���). plugging in a point estimate P for P . This is the only inference step. All other arrows on the right are exact analogs of those on the left. Plug-in methods are famil- iar friends in classical statistics, when, for instance, we estimate the standard deviation [p(1 ��� p)/n]1/2 of a binomial proportion p by [p(1 ��� p )/n]1/2. Fisher ex- tended the same tactic to information calculations for maximum likelihood estimators, substituting I(�� )���1/2 for I(��)���1/2. Our advantage is that modern comput- ers allow us to carry out the plug-in principle with im- punity, calculating P ��� x��� ��� �� ��� by brute force. How far can the plug-in principle be trusted? ���Pretty far��� is a reasonable summary of current bootstrap research. Simple bootstrap ideas, like resampling from the empirical distribution, work surprisingly well in a surprisingly large catalog of cases, yet there are situations where plugging-in starts to get worrisome. Figure 2 concerns a genomics example. A total of 1391 HIV viral genomes were collected from AIDS patients who were taking various protease in- hibitor (PI) drugs. The data for each genome com- prise 74 numbers representating the amino acid present FIG. 2. Top panel: Histogram of z values for 444 main effects observed in genome data logistic regression. The beaded curve is a spline fitted to histogram counts. Bottom panel: First 10 of 50 bootstrap replications of spline fit. The replications tend to be wider than the original beaded curve. at each of 74 positions on the viral protease gene, recorded as 0 or 1, respectively, if the amino acid was or was not the usual one present at that position in wild- type HIV: 1���s indicate mutations caused by the drug treatment. The investigators wondered which of the six different PI drugs were associated with which muta- tions. Complicating matters, a majority of the 1391 pa- tients took more than one PI (the average being 2.05) a few even took all six. A logistic regression that had 444 = 6 �� 74 main effects, one for each drug at each genome position, was fitted to the 1391 �� 74 0���1 amino acid responses. This gave the 444 z values (coefficient estimate divided by standard error) that appear in the histogram in the top panel of Figure 2. The beaded curve is a smooth Poisson generalized linear model (GLM) fit to the histogram counts, performed using a natural spline with 7 degrees of freedom. The central peak is normal shaped with mean and standard deviation �� = ���0.38 and �� = 1.20, where �� is computed from the curvature of the spline fit at ��. How accurately determined is the spline fit? The usual Poisson GLM standard errors are inappropriate since the 444 z values, and therefore the histogram counts, are mutually correlated. Instead I applied the nonparametric one-sample bootstrap with the 1391 genomes (each with its 74 numbers intact) as the resampling units. Each bootstrap data set gave boot- strap z values, a histogram and a natural spline fit. The bottom panel shows the first 10 of 50 bootstrap spline fits. The 50 bootstrap estimates �� ���, each computed in the same way as the original �� = 1.20, had empirical mean 1.37 and empirical standard deviation 0.12. The value 0.12 is a reasonable estimate for the standard error of �� = 1.20, but in this case there is some cause for concern about the plug-in principle: 43 of the 50 �� ������s exceeded �� . In the bottom panel we can see that the bootstrapped curves systematically exceed the width of the original curve. It is easy to understand what is happening here: If the ith bootstrap z value zi ��� has bootstrap mean and variance (zi , vi ) (nearly true, with the vi ���s roughly 1, except for zi near 0 where they are smaller), then the empirical variance of the bootstrap histogram will be inflated by about ��. v We could correct the �� ���2 values by subtracting ��, v but this takes us beyond the realm of the plug-in principle. The ���dilation phenomenon��� in the bottom panel of Figure 2 occurs in classical situations, as with Stein estimation or the Neyman���Scott example. It points to a

71 Readers on Mendeley
by Discipline

30% Ph.D. Student

18% Post Doc

11% Assistant Professor
by Country

15% Germany

13% United States