Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

High-dimensional classification studies have become widespread across various domains. The large dimensionality, coupled with the possible presence of data contamination, motivates the use of robust, sparse estimation methods to improve model interpretability and ensure the majority of observations agree with the underlying parametric model. In this study, we propose a robust and sparse estimator for logistic regression models, which simultaneously tackles the presence of outliers and/or irrelevant features. Specifically, we propose the use of (Formula presented.) -constraints and mixed-integer conic programming techniques to solve the underlying double combinatorial problem in a framework that allows one to pursue optimality guarantees. We use our proposal to investigate the main drivers of honey bee (Apis mellifera) loss through the annual winter loss survey data collected by the Pennsylvania State Beekeepers Association. Previous studies mainly focused on predictive performance, however our approach produces a more interpretable classification model and provides evidence for several outlying observations within the survey data. We compare our proposal with existing heuristic methods and non-robust procedures, demonstrating its effectiveness. In addition to the application to honey bee loss, we present a simulation study where our proposal outperforms other methods across most performance measures and settings.

References Powered by Scopus

Regularization and variable selection via the elastic net

13096Citations
N/AReaders
Get full text

Regularization paths for generalized linear models via coordinate descent

12213Citations
N/AReaders
Get full text

Variable selection via nonconcave penalized likelihood and its oracle properties

6415Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Honey bee colony loss linked to parasites, pesticides and extreme weather across the United States

51Citations
N/AReaders
Get full text

An automated exact solution framework towards solving the logistic regression best subset selection problem

1Citations
N/AReaders
Get full text

An Analysis of Students' failing in University Based on Least Square Method and a New arctan-exp Logistic Regression Function

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Insolia, L., Kenney, A., Calovi, M., & Chiaromonte, F. (2021). Robust Variable Selection with Optimality Guarantees for High-Dimensional Logistic Regression. Stats, 4(3), 665–681. https://doi.org/10.3390/stats4030040

Readers' Seniority

Tooltip

Researcher 2

50%

Professor / Associate Prof. 1

25%

PhD / Post grad / Masters / Doc 1

25%

Readers' Discipline

Tooltip

Decision Sciences 1

25%

Agricultural and Biological Sciences 1

25%

Computer Science 1

25%

Mathematics 1

25%

Save time finding and organizing research with Mendeley

Sign up for free