Supervised Learning: Practice and Theory of Classification with the k-NN Rule

Frank Nielsen

Book Chapter

Supervised Learning: Practice and Theory of Classification with the k-NN Rule

Nielsen F

DOI: 10.1007/978-3-319-21903-5_9

N/ACitations

3Readers

Get full text

Abstract

The k-NN classification rule labels a new observation query q from a test set by choosing the dominant class among the k nearest neighbors of q in the training set. One evaluates the performance of a classifier by calculating its F-score that is the harmonic mean of the precision rate and the recall rate. This yields a single quality value that takes into account the four different cases that can occur when classifying observations in one of either two classes (false/true-positive/negative). In statistical machine learning, a classifier can never beat the optimal Bayes' error (or the probability of error), and the 1-NN guarantees asymptotically an error factor of 2. Since the nearest neighbor queries are decomposable queries, meaning that $$$\$mathrm {NN}_k(q,X_1$\$cup X_2)=$\$mathrm {NN}_k($\$mathrm {NN}_k(q,X_1),$\$mathrm {NN}_k(q,X_2))$$NNk(q,X1∪X2)=NNk(NNk(q,X1),NNk(q,X2)), the k-NN classification rule can be easily parallelized on a distributed memory architecture like a computer cluster. One of the drawback of the k-NN rule is that it needs to store all the training set in order to classify new observations.

Cite

CITATION STYLE

APA

Nielsen, F. (2016). Supervised Learning: Practice and Theory of Classification with the k-NN Rule (pp. 213–229). https://doi.org/10.1007/978-3-319-21903-5_9

Supervised Learning: Practice and Theory of Classification with the k-NN Rule

Abstract

Cite

Register to see more suggestions