Bayesian multilevel analysis and MCMC

David Draper

Book Chapter

Bayesian multilevel analysis and MCMC

Draper D

Springer New York, (2008), 77-139

DOI: 10.1007/978-0-387-73186-5_2

41Citations

84Readers

Get full text

Abstract

Multilevel models have gained wide acceptance over the past 20 years in many fields, including education and medicine [e.g., 26, 43, 45], as an important methodology for dealing appropriately with nested or clustered data. The idea of conducting an experiment in such a way that the levels of one factor are nested inside those of another goes back all the way to the initial development, in the 1920s, of the analysis of variance (ANOVA; [34]), so there?s nothing new in working with nested data; the novelty in recent decades is in the methods for fitting multilevel models, the ability to work with data possessing many levels of nesting and multiple predictor variables at any or all levels, and an increased flexibility in distributional assumptions. The earliest designs featured one-way ANOVA models such as1 yij = μ + αTj + aSij , j = 1, . . . , J, i = 1, . . . , nj , ∑j=1J nj = N, ∑Jj=1 αTj = 0, aSij∼iid N(0, σ2S), in which the subject factor S (indexed by i), treated as random, is nested within the treatment factor T (indexed by j), treated as fixed. Under the normality assumption in (2.1) such models required little for the (frequentist) estimation of the parameters μ, σ2S , and the αTj beyond minor extensions of the least squares methods known since the time of Legendre [51] and Gauss [36]. Regarding the treatment factor as random, however, by changing the αTj to aTj ∼iid N(0, α2T ) (with the a Tj and aS ij mutually independent), created substantial new difficulties in model fitting-indeed, as late as the 1950s, one of the leading estimation methods [e.g., 65] was based on unbiased estimates of the variance components α2T and α2s , the former of which can easily, and embarrassingly, go negative when α2T is small. Fisher [33] had much earlier pioneered the use of maximum likelihood estimation, but before the widespread use of fast computers this approach was impractical in random-effects and mixed models such as yij = β0 + β1(xij ? x̄) + aTj + aSij , j = 1, . . . , J, i = 1, . . . , nj , XJj=1 nj = N, aTj∼iid N(0, σ;a2T), aSij∼iid N(0, σ2S) (where the xij are fixed known values of a predictor variable and ?x is the sample mean of this variable), because the likelihood equations in such models can only be solved iteratively. Multilevel modeling entered a new phase in the 1980s, with the development of computer programs such as ML3, VARCL, and HLM using likelihood-based estimation approaches based on iterative generalized least squares [42], Fisher scoring [52], and the EM algorithm [e.g., 15], respectively. In particular, the latest versions of MLwiN (the successor to ML3; [60]) and HLM [66] have worldwide user bases in the social and biomedical sciences numbering in the thousands, and likelihood-based fitting of at least some multilevel models is also now obtainable in more general-purpose statistical packages such as SAS [64] and Stata [71]. However, the use of the likelihood function alone in multilevel modeling can lead to the following technical problems: Maximum-likelihood estimates (MLEs) and their (estimated asymptotic) standard errors (SEs) can readily be found by iterative means for the parameters in Gaussian multilevel models such as (2.2), but interval stimates of those parameters can be problematic when J, the number of level-2 units, is small. For example, simple ?95%? intervals of the form α̂2T ± 1.96 b se(α̂2T ) (based on the large-sample Gaussian repeated-sampling distribution of α̂2T ) can go negative and can have actual coverage levels substantially below 95%, and other methods based only on α̂2T and b se(α̂2T ) (which are the default outputs of packages such as MLwiN and HLM) are not guaranteed to do much better, in part because (with small sample sizes) the MLE of α2T can be 0 even when the true value of α2T is well away from 0 [e.g., 12]. The situation becomes even more difficult when the outcome variable y in the multilevel model is dichotomous rather than Gaussian, as in randomeffects logistic regression (RELR) models such as (yij | p ij) ∼indep Bernoulli(pij), where logit(pij) = β0 + β1(xij ? x̄) + uj , uj ∼iid N(0, α2u). Here the likelihood methods that work with Gaussian outcomes fail; the likelihood function itself cannot even be evaluated without integrating out the random effects uj from (2.3). Available software such as MLwiN fits RELR models via quasi-likelihood methods [7]; this approach to fitting nonlinear models such as (2.3) proceeds by linearizing the second line of the model via Taylor series expansion, yielding marginal and penalized quasi-likelihood (MQL and PQL) estimates according to the form of the expansion used. These are not full likelihood methods and would be better termed likelihood-based techniques. Browne and Draper [12] have shown that the actual coverage of nominal 95% interval estimates with this approach in RELR models can be far less than 95% when the intervals are based only on MQL and PQL point estimates and their (estimated asymptotic) SEs; see Section 2.3.3 below. Calibration results of this kind for other methods which attempt to more accurately approximate the actual likelihood function [e.g., 1, 50, 53, 57, 61] are sparse and do not yet fully cover the spectrum of models in routine use, and user-friendly software for many of these methods is still hard to come by. This chapter concerns the Bayesian approach to fitting multilevel models, which (a) attempts to remedy the above problems (though not without introducing some new challenges of its own) and (b) additionally provides a mechanism for the formal incorporation of any prior information which may be available about the parameters of the multilevel model of interest external to the current data set. A computing revolution based on Markov chain Monte Carlo (MCMC) methods, and the availability of much faster (personal) computers, have together made the Bayesian fitting of multilevel models increasingly easier since the early 1990s. In this chapter I (1) describe the basic outline of a Bayesian analysis (multilevel or not), in the context of a case study, (2) motivate the need for simulation-based computing methods, (3) describe MCMC methods in general and their particular application to multilevel modeling, (4) discuss MCMC diagnostic methods (to ensure accuracy of the computations), and (5) present an MCMC solution to the multilevel modeling case study. © 2008 Springer Science+Business Media, LLC.

Cite

CITATION STYLE

APA

Draper, D. (2008). Bayesian multilevel analysis and MCMC. In Handbook of Multilevel Analysis (pp. 77–139). Springer New York. https://doi.org/10.1007/978-0-387-73186-5_2

Bayesian multilevel analysis and MCMC

Abstract

Cite

Register to see more suggestions