Supervised Learning: Practice and Theory of Classification with the k-NN Rule

  • Nielsen F
N/ACitations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

The k-NN classification rule labels a new observation query q from a test set by choosing the dominant class among the k nearest neighbors of q in the training set. One evaluates the performance of a classifier by calculating its F-score that is the harmonic mean of the precision rate and the recall rate. This yields a single quality value that takes into account the four different cases that can occur when classifying observations in one of either two classes (false/true-positive/negative). In statistical machine learning, a classifier can never beat the optimal Bayes' error (or the probability of error), and the 1-NN guarantees asymptotically an error factor of 2. Since the nearest neighbor queries are decomposable queries, meaning that $$$\$mathrm {NN}_k(q,X_1$\$cup X_2)=$\$mathrm {NN}_k($\$mathrm {NN}_k(q,X_1),$\$mathrm {NN}_k(q,X_2))$$NNk(q,X1∪X2)=NNk(NNk(q,X1),NNk(q,X2)), the k-NN classification rule can be easily parallelized on a distributed memory architecture like a computer cluster. One of the drawback of the k-NN rule is that it needs to store all the training set in order to classify new observations.

Cite

CITATION STYLE

APA

Nielsen, F. (2016). Supervised Learning: Practice and Theory of Classification with the k-NN Rule (pp. 213–229). https://doi.org/10.1007/978-3-319-21903-5_9

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free