K-nearest-neighbours is a simple classifier, and with increasing size of training set, the accuracy of its class predictions can be made asymptotic to the upper bound. Probabilistic classifications can also be generated: the accuracy of a simple proportional scheme will also be asymptotic to the upper bound, in the case of large training sets. Outside this limit, this and other existing schemes make ineffective use of the available information: this paper proposes a more accurate method, that improves the state-of-the-art performance, evaluated on several public data sets. Criteria such as the degree of unanimity among the neighbours, the observed rank of the correct class, and the intra-class confusion matrix can be used to tabulate the observed classification accuracy within the (cross-validated) training set. These tables can then be used to make probabilistic class predictions for the previously unseen test set, used to evaluate the novel and previous methods in two ways: i) mean a posteriori probability and ii) accuracy of the discrete prediction obtained from integrating the probabilistic estimates from independent sources. The proposed method performs particularly well in the limit of small training set sizes.
CITATION STYLE
D Mallah, C., & Orwell, J. (2013). Probabilistic Classification from a K-Nearest-Neighbour Classifier. Computational Research, 1(1), 1–9. https://doi.org/10.13189/cr.2013.010101
Mendeley helps you to discover research relevant for your work.