Background: Immunoglobulin A nephropathy (IgAN) is considered a chronic renal disease and the most prevalent glomeru-lonephritis throughout the world. In order to model a large number of extracted biomarkers and identify the most effective biomarkers on IgAN disease, the researchers implemented 2 methods of penalized regression, known as LASSO and MCP logistic regression versus random forest method, which are appropriate for high dimensional and low sample size problems. Methods: Urinary protein profiles for both groups were composed of 493 proteins. Data were obtained in the case group (13 patients) using an experiment on urinary protein profile of patients with IgAN and in the control group (8 healthy individuals) using nanoscale liquid chromatography with tandem mass spectrometry. Mann Whitney test as univariate analysis, and LASSO, MCP and random forest as multivariate analysis were used to evaluate the simultaneous effect of biomarkers on IgAN in a high dimensional and low sample size setting. All the statistical analyses were performed in the R 3.3.2 software. Results: Although Mann Whitney test showed that 144 out of 493 proteins were significantly different between the 2 groups, LASSO, MCP, and random forest showed only 7, 3, and 5 biomarkers as effective factors in IgAN diseases, respectively. The most effective biomarker was SULF2 (OR = 0.28) and ALBU (OR = 2.66) in LASSO, A1AT (OR = 73.7) in MCP, and GOLM1 and IBP7 in the random forest method. Conclusions: Because all the 3 models were able to truly differentiate all the IgAN patients from the control groups, the researchers suggest the proposed model for high dimensional and low sample size datasets.
CITATION STYLE
Almasi, A., Kalantari, S., Hashemian, A., & Majd, T. M. (2018). Penalized regression versus random forest model in analyzing high dimensional proteomic data: Diagnosis of IgA nephropathy. Shiraz E Medical Journal, 19(1). https://doi.org/10.5812/semj.14931
Mendeley helps you to discover research relevant for your work.