Regression Models for Ordinal Data

  • Johnson V
  • Albert J
N/ACitations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In the December 1, 2020, issue of JAMA, Self et al 1 reported a ran-domized clinical trial that evaluated whether treatment with hy-droxychloroquine improved clinical outcomes of adults hospitalized with COVID-19 compared with placebo. The primary outcome was the patient's clinical status 14 days after randomization, assessed with an ordinal 7-category scale ranging from worst ("death") to the best ("discharged from the hospital and able to perform normal activities"). The term "ordinal" is applied to an outcome measure for which its mutually exclusive categories can be ordered by their clinical preference. The primary outcome was analyzed with a multivariable ordinal logistic regression model, which is a regression model for an ordinal dependent variable. The authors found that there was not a statistically significant difference between the hy-droxychloroquine and placebo groups in clinical status 14 days after randomization. Ordinal outcomes are those for which their categories (in this case, specific clinical states at 14 days) can be naturally rank ordered, but the degree of difference between categories may not be quantifiable. 2 For example, the event "discharged from the hospital and able to perform normal activities" is better than "death," but there is no numerical quantity to measure how much better. Analyses of ordinal outcomes seek to maximize use of available information by exploiting the inherent rank ordering of their categories. Important information can be lost if the rank ordering of the outcome categories is ignored. An arbitrary binary dichotomization of an ordinal outcome (ie, collapsing the ordered categories into 2 levels) sacrifices granularity and reduces statistical power. 3 In addition , a composite binary outcome-an indicator of whether any event occurs-treats each component event equally and ignores their relative severity. Although binary outcomes are simple to analyze using standard logistic regression models, this loss of information is inconsistent with the principles of ethical conduct of research, which require that investigators maximize the possible benefits of research , including the most efficient use of study data. At the same time, an ordinal outcome should not be analyzed as a continuous variable (eg, using a linear regression model) because an ordinal outcome does not take on numeric values and likely does not satisfy the assumptions required for such models. However, in some situations , quantitative outcomes can be treated as ordinal outcomes for statistical analysis purposes. 4 Description of Regression Models for Ordinal Outcomes Ordinal logistic regression models are tailored for the analysis of ordinal outcomes. 5 A nonparametric Mann-Whitney test (also known as the Wilcoxon rank sum test) could evaluate the null hypothesis that the distribution of the ordinal outcome is similar across groups, but such a test does not allow adjustment for baseline characteristics. Similar to the Mann-Whitney test, the ordinal logistic regression model makes few assumptions about the distribution of the ordinal outcome categories so the model can accommodate outcomes that exhibit a skewed distribution. A commonly used ordinal regression model is the cumulative logit model, which consists of a set of logistic regression models for each possible binary dichotomiza-tion of the ordinal outcome. For example, for the 7-category scale used by Self et al, 1 there were 6 possible dichotomizations, and therefore 6 logistic regression models. For each of these models, the association of an independent variable (eg, treatment with hydroxy-chloroquine vs placebo) is quantified by an odds ratio. Across these logistic regression models, the odds ratios are often assumed to be equal. The same assumption is made for the associations of any other independent variables (baseline clinical characteristics such as age and sex). Thus, the association of each independent variable with the ordinal outcome is represented by a single odds ratio. Odds ratios greater than 1 indicate more frequent outcomes in the higher-ranked categories, and vice versa. The assumption that the association of each independent variable with the outcome of interest is represented by a single odds ratio is known as the proportional odds assumption. Although statistical hypothesis tests are available to evaluate the proportional odds assumption, these tests can falsely reject the null hypothesis that the assumption is satisfied, leading to an incorrect conclusion that the analysis is invalid. Instead, graphical summaries can be used to qualitatively evaluate the proportional odds assumption. 6 Other types of ordinal regression models might be preferred in some settings. A partial proportional odds model assumes proportional odds for a subset of the independent variables, which might be preferred in studies focused on a particular intervention (eg, treatment with hydroxychloroquine vs placebo) for which a single odds ratio is of scientific interest, while not requiring proportional odds for other independent variables (eg, baseline clinical characteristics) or for situations in which the proportional odds assumption is violated for 1 or more independent variables. 7 A continuation ratio model would be preferred when the outcome categories represent successive stages of disease progression. As with any multivariable analysis, the results of an ordinal logistic regression model should be presented along with appropriate summary statistics and graphical summaries. For an ordinal outcome , a stacked bar plot with a sequential color scale, along with percentages for each category, can illustrate the distribution of the outcome for each group (eg, Figure 2 in the study by Self et al 1). Assuming proportional odds, the ordinal logistic regression model provides an adjusted odds ratio and CI comparing the ordinal outcome between groups. With outcome categories ordered from worst to best, an odds ratio greater than 1 indicates better outcomes. The odds ratio represents the proportional association of the independent variable with the odds of better outcomes; this interpretation applies for each dichotomization of the ordinal outcome. A hypothesis test in which the odds ratio is equal to 1 can evaluate whether the association is statistically significant.

Cite

CITATION STYLE

APA

Johnson, V. E., & Albert, J. H. (1999). Regression Models for Ordinal Data (pp. 126–157). https://doi.org/10.1007/0-387-22702-4_4

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free