Counting positives accurately despite inaccurate classification

George Forman

Conference ProceedingsOPEN ACCESS

Counting positives accurately despite inaccurate classification

Forman G

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2005) 3720 LNAI 564-575

DOI: 10.1007/11564096_55

74Citations

33Readers

Abstract

Most supervised machine learning research assumes the training set is a random sample from the target population, thus the class distribution is invariant. In real world situations, however, the class distribution changes, and is known to erode the effectiveness of classifiers and calibrated probability estimators. This paper focuses on the problem of accurately estimating the number of positives in the test set - quantification - as opposed to classifying individual cases accuratel y. It compares three methods: classify & count, an adjusted variant, and a mixture model. An empirical evaluation on a text classification benchmark reveals that the simple method is consistently biased, and that the mixture model is surprisingly effective even when positives are very scarce in the training set - a common case in information retrieval. © Springer-Verlag Berlin Heidelberg 2005.

Cite

CITATION STYLE

APA

Forman, G. (2005). Counting positives accurately despite inaccurate classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3720 LNAI, pp. 564–575). https://doi.org/10.1007/11564096_55

Counting positives accurately despite inaccurate classification

Abstract

Cite

Register to see more suggestions