Exploring uplift modeling with high class imbalance

4Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Uplift modeling refers to individual level causal inference. Existing research on the topic ignores one prevalent and important aspect: high class imbalance. For instance in online environments uplift modeling is used to optimally target ads and discounts, but very few users ever end up clicking an ad or buying. One common approach to deal with imbalance in classification is by undersampling the dataset. In this work, we show how undersampling can be extended to uplift modeling. We propose four undersampling methods for uplift modeling. We compare the proposed methods empirically and show when some methods have a tendency to break down. One key observation is that accounting for the imbalance is particularly important for uplift random forests, which explains the poor performance of the model in earlier works. Undersampling is also crucial for class-variable transformation based models.

Cite

CITATION STYLE

APA

Nyberg, O., & Klami, A. (2023). Exploring uplift modeling with high class imbalance. Data Mining and Knowledge Discovery, 37(2), 736–766. https://doi.org/10.1007/s10618-023-00917-9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free