Multilevel and related models for longitudinal data

36Citations
Citations of this article
181Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Longitudinal data, often called repeated measurements in medicine and panel data in the social sciences, arise when units provide responses on multiple occasions. Such data can be thought of as clustered or two-level data with occasions i at level 1 and units j at level 2. One feature distinguishing longitudinal data from other types of clustered data is the chronological ordering of the responses, implying that level-1 units cannot be viewed as exchangeable. Another feature of longitudinal data is that they often consist of a large number of small clusters. A typical aim in longitudinal analysis is to investigate the effects of covariates both on the overall level of the responses and on changes of the responses over time. An important merit of longitudinal designs is that they allow the separation of cross-sectional and longitudinal effects. They also allow the investigation of heterogeneity across units both in the overall level of the response and in the development over time. Heterogeneity not captured by observed covariates produces dependence among responses even after controlling for those covariates. This violates the typical assumptions of ordinary regression models and must be accommodated to avoid invalid inference. It is useful to distinguish between longitudinal data with balanced and unbalanced occasions. The occasions are balanced if all units are measured at the same time points ti, i = 1, ⋯, n, and unbalanced if units are measured at different time points, tij , i = 1, ⋯ nj . In the case of balanced occasions, the data can also be viewed as single-level multivariate data where responses at different occasions are treated as different variables. One advantage of the univariate multilevel approach taken here is that unbalanced occasions and missing data are accommodated without resorting to complete case analysis (sometimes called listwise deletion). We will use maximum likelihood estimation, which produces consistent estimates if responses are missing at random (MAR) as defined by Rubin [59]; see Chapter 10 [40] for other approaches in the case of MAR and Verbeke and Molenberghs [65] for approaches in the case of responses not missing at random (NMAR). In this chapter we will consider both linear mixed models and generalized linear mixed models. A linear mixed model is written in Chapter 1, equation (1.4), as y-j= Xjβ + Zjσ -j + e-j , where yj is the vector of continuous responses for unit j. In this book the covariate matrices Xj and Zj are treated as fixed. Extra assumptions are required when these matrices are treated as random; see, for instance, Rabe-Hesketh and Skrondal [54]. A generalized linear mixed model also accommodates non-continuous responses and can be written as g(E(yj| σj)) = Xjβ + Zjσ-j=δ η-j, where g(?) is a link function and -j is a vector of linear predictors. Conditional on the random effects σj , the elements yij of yj have a distribution from the exponential family and are mutually independent. See Rabe-Hesketh and Skrondal [54] and Chapter 9 [58] for treatments of generalized linear mixed models. For dichotomous and ordinal responses, generalized linear mixed models with logit and probit links can also be defined using a latent response formulation. A linear mixed model is in this case specified for an imagined continuous latent response y-ij . The observed dichotomous or ordinal response yij with S > 1 categories results from partitioning y-ij into S segments using S ? 1 cut-points or thresholds; see Chapter 6 [31] for details. We will use an example dataset to illustrate some of the ideas discussed in this chapter. The dataset comes from an American panel survey of 545 young males taken from the National Longitudinal Survey (Youth Sample) for the period 1980-1987. The data were previously analyzed by Vella and Verbeek [64] and can be downloaded from the web pages of Wooldridge [70] and Rabe-Hesketh and Skrondal [53]. The response variable is the natural logarithm of the hourly wage in US dollars and the following covariates will be used: educ: Years of schooling (x1j) black: Dummy variable for being black (x2j) hisp: Dummy variable for being Hispanic (x3j) labex: Labor market experience (in 2-year periods) (x4ij) labexsq: Labor market experience squared (x5ij) married: Dummy variable for being married (x6ij) union: Dummy variable for being a member of a union (x7ij) The first three covariates are time-constant, whereas the next four are timevarying. © 2008 Springer Science+Business Media, LLC.

Cite

CITATION STYLE

APA

Skrondal, A., & Rabe-Hesketh, S. (2008). Multilevel and related models for longitudinal data. In Handbook of Multilevel Analysis (pp. 275–299). Springer New York. https://doi.org/10.1007/978-0-387-73186-5_7

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free