Extending logistic regression models with factorization machines

3Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Including categorical variables with many levels in a logistic regression model easily leads to a sparse design matrix. This can result in a big, ill-conditioned optimization problem causing overfitting, extreme coefficient values and long run times. Inspired by recent developments in matrix factorization, we propose four new strategies of overcoming this problem. Each strategy uses a Factorization Machine that transforms the categorical variables with many levels into a few numeric variables that are subsequently used in the logistic regression model. The application of Factorization Machines also allows for including interactions between the categorical variables with many levels, often substantially increasing model accuracy. The four strategies have been tested on four data sets, demonstrating superiority of our approach over other methods of handling categorical variables with many levels. In particular, our approach has been successfully used for developing high quality risk models at the Netherlands Tax and Customs Administration.

Cite

CITATION STYLE

APA

Pijnenburg, M., & Kowalczyk, W. (2017). Extending logistic regression models with factorization machines. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10352 LNAI, pp. 323–332). Springer Verlag. https://doi.org/10.1007/978-3-319-60438-1_32

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free