Continuous variable binning algorithm to maximize information value using genetic algorithm

3Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Binning (bucketing or discretization) is a commonly used data pre-processing technique for continuous predictive variables in machine learning. There are guidelines for good binning which can be treated as constraints. However, there are also statistics which should be optimized. Therefore, we view the binning problem as a constrained optimization problem. This paper presents a novel supervised binning algorithm for binary classification problems using a genetic algorithm, named GAbin, and demonstrates usage on a well-known dataset. It is inspired by the way that human bins continuous variables. To bin a variable, first, we choose output shapes (e.g., monotonic or best bins in the middle). Second, we define constraints (e.g., minimum samples in each bin). Finally, we try to maximize key statistics to assess the quality of the output bins. The algorithm automates these steps. Results from the algorithm are in the user-desired shapes and satisfy the constraints. The experimental results reveal that the proposed GAbin provides competitive results when compared to other binning algorithms. Moreover, GAbin maximizes information value and can satisfy user-desired constraints such as monotonicity or output shape controls.

Cite

CITATION STYLE

APA

Vejkanchana, N., & Kucharoen, P. (2019). Continuous variable binning algorithm to maximize information value using genetic algorithm. In Communications in Computer and Information Science (Vol. 1051 CCIS, pp. 158–172). Springer. https://doi.org/10.1007/978-3-030-32475-9_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free