Abstract
Let (X, Y) be a random variable consisting of an observed feature vector X ∈ 𝒳and an unobserved class label Y ∈ (1,2,…, L) with unknown joint distribution. In addition, let 𝒟 be a training data set consisting of n completely observed independent copies of (X, Y). Usual classification procedures provide point predictors (classifiers) Ŷ (X, 𝒟) of Y or estimate the conditional distribution of Y given X. In order to quantify the certainty of classifying X we propose to construct for each θ = 1, 2, . . ., L a p-valu πθ (X, 𝒟) for the null hypothesis that Y = θ treating Y temporarily as a fixed parameter. In other words, the point predictor Ŷ (X,𝒟) is replaced with a prediction region for Y with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects. © 2008, Institute of Mathematical Statistics. All rights reserved.
Author supplied keywords
Cite
CITATION STYLE
Dümbgen, L., Igl, B. W., & Munk, A. (2008). P-values for classification. Electronic Journal of Statistics, 2, 468–493. https://doi.org/10.1214/08-EJS245
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.