Parallel large scale feature selection for logistic regression

Sameer Singh; Jeremy Kubica; Scott Larsen; Daria Sorokina

Conference ProceedingsOPEN ACCESS

Parallel large scale feature selection for logistic regression

Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics (2009) 3 1165-1176

DOI: 10.1137/1.9781611972795.100

37Citations

83Readers

Abstract

In this paper we examine the problem of efficient feature evaluation for logistic regression on very large data sets. We present a new forward feature selection heuristic that ranks features by their estimated effect on the resulting model's performance. An approximate optimization, based on backfitting, provides a fast and accurate estimate of each new feature's coefficient in the logistic regression model. Further, the algorithm is highly scalable by parallelizing simultaneously over both features and records, allowing us to quickly evaluate billions of potential features even for very large data sets.

Cite

CITATION STYLE

APA

Singh, S., Kubica, J., Larsen, S., & Sorokina, D. (2009). Parallel large scale feature selection for logistic regression. In Society for Industrial and Applied Mathematics - 9th SIAM International Conference on Data Mining 2009, Proceedings in Applied Mathematics (Vol. 3, pp. 1165–1176). https://doi.org/10.1137/1.9781611972795.100

Parallel large scale feature selection for logistic regression

Abstract

Cite

Register to see more suggestions