Honest variable selection in linear and logistic regression models via l1 and l1+l2 penalization

104Citations
Citations of this article
62Readers
Mendeley users who have this article in their library.

Abstract

This paper investigates correct variable selection in finite samples via l1 and l1 + l2 type penalization schemes. The asymptotic consistency of variable selection immediately follows from this analysis. We focus on logistic and linear regression models. The following questions are central to our paper: Given a level of confidence1 − δ, under which assumptions on the design matrix, for which strength of thesignal and for what values of the tuning parameters can we identify the true model at the given level of confidence? Formally, if Î is an estimate of the true variable set I*, we study conditions under which P(Î= I*) ≥ 1 − δ, for a given sample size n, number of parameters M and confidence 1 − δ. We show that in identifiable models, both methods can recover coefficients of size √1 n, uptosmall multiplicativeconstants and logarithmic factors in M and 1/δ. The advantage of the l1 + l2 penalization over the l1 is minor for the variable selection problem, for the models we consider here. Whereas the former estimates are unique, and become more stable for highly correlated data matrices as one increases the tuning parameter of the l2 part, too large an increase in this parameter value may preclude variable selection. © 2008, Institute of Mathematical Statistics. All rights reserved.

Cite

CITATION STYLE

APA

Bunea, F. (2008). Honest variable selection in linear and logistic regression models via l1 and l1+l2 penalization. Electronic Journal of Statistics, 2, 1153–1194. https://doi.org/10.1214/08-EJS287

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free