Multinomial and ordinal Logistic regression analyses with multi-categorical variables using R

Jiaqi Liang; Guoshu Bi; Cheng Zhan

Journal ArticleOPEN ACCESS

Multinomial and ordinal Logistic regression analyses with multi-categorical variables using R

Liang J
Bi G
Zhan C

Annals of Translational Medicine (2020) 8(16) 982-982

DOI: 10.21037/atm-2020-57

N/ACitations

135Readers

Abstract

It is essential in a standard linear regression analysis that dependent variables are continuous. However, using the standard linear regression for the analysis of a double-level or multi-level outcome can lead to unsatisfactory results because the validity of this regression model relies on the variability of the outcome being the same for all values of predictors, which is contrary to the nature of double-level or multi-level outcomes (1). Therefore, when the dependent variable consists of several categories, a maximum likelihood estimator, such as multinomial logit or probit, should be used instead of the ordinary least square estimator (2). Logistic regression can be used to describe the relationship between an independent variable(s) (either continuous or not) and a dichotomous or multi-categorical dependent variable as a supplementary variable to the standard linear regression. Zhou et al. (3) elaborated a series of reliable methodologies using the R software to construct clinical prediction models with detailed steps and operable code examples, according to different types of clinical data and categories of variables. They summarized the process of the construction of practical clinical prediction models (nomograms), including data screening, primary model training, and internal and external validations, which was an extraordinary work and a practical reference in the field of statistics (4-6). Based on this study, a simpler and more accurate prediction model was introduced as an extension by Bi et al. (7), which was designed for extracting polynomial equations and calculating the points of each variable together with survival probabilities. The general objective of logistic regression models is to predict outcomes using variables based on certain existing data, which have been applied in medical research for various diseases (8). In the study by Zhou and his colleagues (3), the authors converted multi-categorical outcomes into dichotomous ones and introduced a dichotomous logistic regression using R codes. However, multi-categorical outcomes can be directly applied in multinomial or ordinal logistic regression analyses in the R software, although the results might be difficult to be interpreted with more complicated steps. This study aimed to display the methods and processes used to apply multi-categorical variables in logistic regression models in the R software environment. The sample data was made up of patients registered in the SEER database in 2015 with diagnoses of lung adenocarcinomas. Patients with unclear race, primary site(s) of their tumors, differentiation grade of their tumors, tumor stage (AJCC, 6 th edition), or cause of their death were excluded. Finally, 6,483 patients met the inclusion and exclusion criteria, of which 3,000 were randomly selected for the following analysis. Age, sex, race, primary site of the tumor, cell differentiation grade, AJCC stage of the tumor, and history of chemotherapy, were chosen as independent variables. The ages of included patients, ranging from 20 to 100 years, were divided into eight groups, labeled as "AgeGroup". Also, several Editorial

Cite

CITATION STYLE

APA

Liang, J., Bi, G., & Zhan, C. (2020). Multinomial and ordinal Logistic regression analyses with multi-categorical variables using R. Annals of Translational Medicine, 8(16), 982–982. https://doi.org/10.21037/atm-2020-57

Multinomial and ordinal Logistic regression analyses with multi-categorical variables using R

Abstract

Cite

Register to see more suggestions