Exploring uplift modeling with high class imbalance

Otto Nyberg; Arto Klami

Journal ArticleOPEN ACCESS

Exploring uplift modeling with high class imbalance

Data Mining and Knowledge Discovery (2023) 37(2) 736-766

DOI: 10.1007/s10618-023-00917-9

4Citations

12Readers

Abstract

Uplift modeling refers to individual level causal inference. Existing research on the topic ignores one prevalent and important aspect: high class imbalance. For instance in online environments uplift modeling is used to optimally target ads and discounts, but very few users ever end up clicking an ad or buying. One common approach to deal with imbalance in classification is by undersampling the dataset. In this work, we show how undersampling can be extended to uplift modeling. We propose four undersampling methods for uplift modeling. We compare the proposed methods empirically and show when some methods have a tendency to break down. One key observation is that accounting for the imbalance is particularly important for uplift random forests, which explains the poor performance of the model in earlier works. Undersampling is also crucial for class-variable transformation based models.

Author supplied keywords

Cite

CITATION STYLE

APA

Nyberg, O., & Klami, A. (2023). Exploring uplift modeling with high class imbalance. Data Mining and Knowledge Discovery, 37(2), 736–766. https://doi.org/10.1007/s10618-023-00917-9

Exploring uplift modeling with high class imbalance

Abstract

Author supplied keywords

Cite

Register to see more suggestions