Sign up & Download
Sign in

Inference in generalized additive mixed modelsby using smoothing splines

by X Lin, D Zhang
Journal of the Royal Statistical Society - Series B: Statistical Methodology ()

Abstract

Generalized additive mixed models are proposed for overdispersed and correlated data, which arise frequently in studies involving clustered, hierarchical and spatial designs. This class of models allows flexible functional dependence of an outcome variable on covariates by using nonparametric regression, while accounting for correlation between observations by using random effects. We estimate nonparametric functions by using smoothing splines and jointly estimate smoothing parameters and variance components by using marginal quasi-likelihood. Because numerical integration is often required by maximizing the objective functions, double penalized quasi-likelihood is proposed to make approximate inference. Frequentist and Bayesian inferences are compared. A key feature of the method proposed is that it allows us to make systematic inference on all model components within a unified parametric mixed model framework and can be easily implemented by fitting a working generalized linear mixed model by using existing statistical software. A bias correction procedure is also proposed to improve the performance of double penalized quasi-likelihood for sparse data. We illustrate the method with an application to infectious disease data and we evaluate its performance through simulation.

Cite this document (BETA)

Available from doi.wiley.com
Page 1
hidden

Inference in generalized additive...

Inference in generalized additive mixed models
by using smoothing splines
Xihong Lin
University of Michigan, Ann Arbor, USA
and Daowen Zhang
North Carolina State University, Raleigh, USA
[Received October 1997. Revised July 1998]
Summary. Generalized additive mixed models are proposed for overdispersed and correlated data,
which arise frequently in studies involving clustered, hierarchical and spatial designs. This class of
models allows ¯exible functional dependence of an outcome variable on covariates by using
nonparametric regression, while accounting for correlation between observations by using random
effects. We estimate nonparametric functions by using smoothing splines and jointly estimate
smoothing parameters and variance components by using marginal quasi-likelihood. Because
numerical integration is often required by maximizing the objective functions, double penalized quasi-
likelihood is proposed to make approximate inference. Frequentist and Bayesian inferences are
compared. A key feature of the method proposed is that it allows us to make systematic inference
on all model components within a uni®ed parametric mixed model framework and can be easily
implemented by ®tting a working generalized linear mixed model by using existing statistical
software. A bias correction procedure is also proposed to improve the performance of double
penalized quasi-likelihood for sparse data. We illustrate the method with an application to infectious
disease data and we evaluate its performance through simulation.
Keywords: Correlated data; Generalized linear mixed models; Laplace approximation; Marginal
quasi-likelihood; Nonparametric regression; Penalized quasi-likelihood; Smoothing parameters;
Variance components
1. Introduction
Generalized linear mixed models (GLMMs) (Breslow and Clayton, 1993) provide a uni®ed
likelihood framework for parametric regression of a variety of overdispersed and correlated
outcomes. Data of this type arise in many ®elds of research, such as longitudinal studies,
survey sampling, clinical trials and disease mapping. A major diculty in making inference
in GLMMs is that a full likelihood analysis is burdened by often intractable numerical
integration. Various approximate inference procedures (Breslow and Clayton, 1993; Lee and
Nelder, 1996; Lin and Breslow, 1996) and Bayesian procedures using EM algorithms and
Gibbs sampling (McCulloch, 1997; Zeger and Karim, 1991) have been proposed. For discus-
sion on full maximum likelihood estimation, see Aitkin (1998).
A key feature of GLMMs is that they use a parametric mean function to model covariate
e€ects, while accommodating overdispersion and correlation by adding random e€ects to the
Address for correspondence: Xihong Lin, Department of Biostatistics, University of Michigan, 1420 Washington
Heights, Ann Arbor, MI 48109, USA.
E-mail: xlin@sph.umich.edu
& 1999 Royal Statistical Society 1369±7412/99/61381
J. R. Statist. Soc. B(1999)
61, Part 2, pp.381^400
Page 2
hidden
linear predictor. However, this parametric mean assumption may not always be desirable,
since appropriate functional forms of the covariates may not be known in advance and the
outcome variable may depend on the covariates in a complicated manner. It is hence of
substantial interest to develop a nonparametric regression model for correlated data by
incorporating a nonparametric mean function in GLMMs. This will allow more ¯exible
functional dependence of the outcome variable on the covariates.
There are very many references on nonparametric regression with independent data using
kernel and spline methods (Ha
È
rdle, 1990; Green and Silverman, 1994). The generalized additive
models of Hastie and Tibshirani (1990) are widely used and well understood. However, only
very limited work has been done on nonparametric regression when the data are correlated.
Most researchers have restricted their attention to longitudinal data with normally distrib-
uted outcomes and a single nonparametric function (Hart, 1991; Rice and Silverman, 1991).
Several researchers have incorporated a nonparametric time function in linear mixed models
(Zeger and Diggle, 1994; Zhang et al., 1998; Verbyla et al., 1998). For non-Gaussian longit-
udinal data, Wild and Yee (1996) and Berhane and Tibshirani (1998) extended generalized
additive models to generalized estimating equations (Liang and Zeger, 1986). There are not
many references on modelling correlated non-Gaussian outcomes nonparametrically within
the mixed e€ects model framework. See Verbyla (1995) for discussion on mixed model
formulation of smoothing splines in generalized linear models for independent non-Gaussian
data.
Nonparametric regression with correlated data faces many new challenges. In addition to
developing an inference procedure for nonparametric functions, we also need to consider how
to draw inference on correlation parameters. Another critical issue, whose importance has
been emphasized by many (Green and Silverman, 1994; Wahba, 1978), is how to select good
estimators of smoothing parameters and bandwidth parameters. Very limited work has been
done on these issues, especially estimation of the correlation parameters and the smoothing
parameters. Conventional data-driven methods for smoothing parameter estimation are
challenged with new problems. For example, although cross-validation (Rice and Silverman,
1991) is a reasonable approach to selecting the smoothing parameters for clustered data, it is
often computationally expensive and subsequent inference on the correlation parameters is
dicult (Zeger and Diggle, 1994) and it fails for crossed designs and spatial data. It is hence
of substantial interest to develop a systematic procedure to make inference on all model
parameters.
In this paper, we propose generalized additive mixed models (GAMMs), which are an
additive extension of GLMMs in the spirit of Hastie and Tibshirani (1990). This new class of
models uses additive nonparametric functions to model covariate e€ects while accounting for
overdispersion and correlation by adding random e€ects to the additive predictor. GAMMs
encompass nested and crossed designs and are applicable to clustered, hierarchical and
spatial data.
We estimate the nonparametric functions by using smoothing splines and jointly estimate
the smoothing parameters and the variance components by using marginal quasi-likelihood.
This marginal quasi-likelihood approach is an extension of the restricted maximum like-
lihood (REML) approach used by Wahba (1985) and Kohn et al. (1991) in the classical
nonparametric regression model (Kohn et al. (1991), equation (2.1)), and by Zhang et al.
(1998), Brumback and Rice (1998) and Wang (1998) in Gaussian nonparametric mixed
models, where they treated the smoothing parameter as an extra variance component.
Because numerical integration is often required by maximizing the objective functions,
double penalized quasi-likelihood (DPQL) is proposed to make approximate inference.
382 X. Lin and D. Zhang

Readership Statistics

40 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
43% Ph.D. Student
 
15% Researcher (at a non-Academic Institution)
 
10% Post Doc
by Country
 
28% United States
 
13% Germany
 
10% Canada

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in