The bane of skew

Thomas A. Lampert; Pierre Gançarski

Journal ArticleOPEN ACCESS

The bane of skew

Lampert T
Gançarski P

Machine Learning (2014) 97(1-2) 5-32

DOI: 10.1007/s10994-013-5432-x

N/ACitations

6Readers

Abstract

While a problem's skew is often assumed to be constant, this paper discusses three settings where this assumption does not hold. Consequently, incorrectly assuming skew to be constant in these contradicting cases results in an over or under estimation of an algorithm's performance. The area under a precision-recall curve (AUCPR) is a common summary measurement used to report the performance of machine learning algorithms. It is well known that precision is dependent upon class skew, which often varies between datasets. In addition to this, it is demonstrated herein that under certain circumstances the relative ranking of an algorithm (as measured by AUCPR) is not constant and is instead also dependent upon skew. The skew at which the performance of two algorithms inverts and the relationship between precision measured at different skews are defined. This is extended to account for temporal skew characteristics and situations in which skew cannot be precisely defined. Formal proofs for these findings are presented, desirable properties are proved and their application demonstrated.

Cite

CITATION STYLE

APA

Lampert, T. A., & Gançarski, P. (2014). The bane of skew. Machine Learning, 97(1–2), 5–32. https://doi.org/10.1007/s10994-013-5432-x

The bane of skew

Abstract

Cite

Register to see more suggestions