On selection bias with imbalanced classes

Gert Jacobusse; Cor Veenman

Conference Proceedings

On selection bias with imbalanced classes

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2016) 9956 LNAI 325-340

DOI: 10.1007/978-3-319-46307-0_21

6Citations

12Readers

Get full text

Abstract

In various applications, such as law enforcement and medical screening, one class outnumbers the other, which is called class imbalance. The inspection to recognize targets from the minority class is usually driven by experience and expert knowledge. In that way, targets can be found way above the base rate to make the inspection process feasible. In order to make the search for targets more efficient, the inspected samples can serve as training set for a learning method. In this study, we show how the introduced selection bias can be remedied in several ways using unlabeled data. With a synthetic dataset and a real-world law enforcement dataset, we show that adding unlabeled data to the non-targets strongly improves ranking performance. Importantly, completely leaving out the labeled non-targets and using only the unlabeled data as non-targets gives the best results.

Author supplied keywords

Cite

CITATION STYLE

APA

Jacobusse, G., & Veenman, C. (2016). On selection bias with imbalanced classes. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9956 LNAI, pp. 325–340). Springer Verlag. https://doi.org/10.1007/978-3-319-46307-0_21

On selection bias with imbalanced classes

Abstract

Author supplied keywords

Cite

Register to see more suggestions