P-values for classification

13Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

Abstract

Let (X, Y) be a random variable consisting of an observed feature vector X ∈ 𝒳and an unobserved class label Y ∈ (1,2,…, L) with unknown joint distribution. In addition, let 𝒟 be a training data set consisting of n completely observed independent copies of (X, Y). Usual classification procedures provide point predictors (classifiers) Ŷ (X, 𝒟) of Y or estimate the conditional distribution of Y given X. In order to quantify the certainty of classifying X we propose to construct for each θ = 1, 2, . . ., L a p-valu πθ (X, 𝒟) for the null hypothesis that Y = θ treating Y temporarily as a fixed parameter. In other words, the point predictor Ŷ (X,𝒟) is replaced with a prediction region for Y with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects. © 2008, Institute of Mathematical Statistics. All rights reserved.

Cite

CITATION STYLE

APA

Dümbgen, L., Igl, B. W., & Munk, A. (2008). P-values for classification. Electronic Journal of Statistics, 2, 468–493. https://doi.org/10.1214/08-EJS245

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free