Continuous variable binning algorithm to maximize information value using genetic algorithm

Nattawut Vejkanchana; Pramote Kucharoen

Conference ProceedingsOPEN ACCESS

Continuous variable binning algorithm to maximize information value using genetic algorithm

Communications in Computer and Information Science (2019) 1051 CCIS 158-172

DOI: 10.1007/978-3-030-32475-9_12

3Citations

1Readers

Get full text

Abstract

Binning (bucketing or discretization) is a commonly used data pre-processing technique for continuous predictive variables in machine learning. There are guidelines for good binning which can be treated as constraints. However, there are also statistics which should be optimized. Therefore, we view the binning problem as a constrained optimization problem. This paper presents a novel supervised binning algorithm for binary classification problems using a genetic algorithm, named GAbin, and demonstrates usage on a well-known dataset. It is inspired by the way that human bins continuous variables. To bin a variable, first, we choose output shapes (e.g., monotonic or best bins in the middle). Second, we define constraints (e.g., minimum samples in each bin). Finally, we try to maximize key statistics to assess the quality of the output bins. The algorithm automates these steps. Results from the algorithm are in the user-desired shapes and satisfy the constraints. The experimental results reveal that the proposed GAbin provides competitive results when compared to other binning algorithms. Moreover, GAbin maximizes information value and can satisfy user-desired constraints such as monotonicity or output shape controls.

Author supplied keywords

Cite

CITATION STYLE

APA

Vejkanchana, N., & Kucharoen, P. (2019). Continuous variable binning algorithm to maximize information value using genetic algorithm. In Communications in Computer and Information Science (Vol. 1051 CCIS, pp. 158–172). Springer. https://doi.org/10.1007/978-3-030-32475-9_12

Continuous variable binning algorithm to maximize information value using genetic algorithm

Abstract

Author supplied keywords

Cite

Register to see more suggestions