Generalized Linear Models I: Logistic Regression

Brian Everitt; Sophia Rabe-Hesketh

Book Chapter

Generalized Linear Models I: Logistic Regression

Everitt B
Rabe-Hesketh S

DOI: 10.1007/978-1-4757-3285-6_10

N/ACitations

7Readers

Get full text

Abstract

So far, most the analyses we have described have been based around linear models that assume normally distributed populations of the response variable and of the error terms from the fitted models. Most linear models are robust to this assumption, although the extent of this robustness is hard to gauge, and transformations can be used to overcome problems with non-normal error terms. There are situations where transformations are not effective in making errors normal (e.g. when response variable is categorical) and in any case, it might be better to model the actual data rather than data that are transformed to meet assumptions. What we need is a technique for modeling that allows other types of distributions besides normal. Such a technique was introduced by Nelder & Wedderburn (1972) and further developed by McCullough & Nelder (1989) and is called generalized linear modeling (GLM). In this chapter, we will examine two common applications of GLMs: logistic regression, used when the response variable is binary, and Poisson regression, when the response variable represents counts. In the next chapter, we will describe log-linear models when both response and predictor variables are categorical and usually arranged in the form of a contingency table. 13.1 Generalized linear models Generalized linear models (GLMs) have a number of characteristics that make them more generally applicable than the general linear models we have considered so far. One of the most important is that least squares estimation no longer applies and maximum likelihood methods must be used (Chapter 2). A GLM consists of three components. First is the random component, which is the response variable and its probability distribution (Chapter 1). The probability distribution must be from the exponential family of distributions, which includes normal, binomial, Poisson, gamma and negative binomial. If Y is a continuous variable, its probability distribution might be normal; if Y is binary (e.g. alive or dead), the probability distribution might be binomial; if Y represents counts, then the probability distribution might be Poisson. Probability distributions from the exponential family can be defined by the natural parameter, a function of the mean, and the dispersion parameter, a function of the variance that is required to produce standard errors for estimates of the mean (Hilbe 1994). For distributions like binomial and Poisson, the variance is related to the mean and the dispersion parameter is set to one. For distributions like normal and gamma, the dispersion parameter is estimated separately from the mean and is sometimes called a nuisance parameter. Second is the systematic component, which represents the predictors (X variables) in the model. These predictors might be continuous and/or categorical and interactions between predictors, and polynomial functions of predictors, can also be included.

Cite

CITATION STYLE

APA

Everitt, B., & Rabe-Hesketh, S. (2001). Generalized Linear Models I: Logistic Regression (pp. 205–222). https://doi.org/10.1007/978-1-4757-3285-6_10

Generalized Linear Models I: Logistic Regression

Abstract

Cite

Register to see more suggestions