Machine learning and the algorithms it uses have been the subject of many and varied studies with the development of artificial intelligence in recent years. One of the popular and widely used classification algorithms is the nearest neighbors’ algorithm and in particular k nearest neighbors. This algorithm has three important steps: calculation of distances; selection of the number of neighbors; and the classification itself. The choice of the value for the k parameter determines the number of neighbors and is important and has a significant impact on the degree of efficiency of the created model. This article describes a study of the influence of the way the k parameter is chosen - manually or automatically. Data sets, used for the study, are selected to be as close as possible in their features to the data generated and used by small businesses - heterogeneous, unbalanced, with relatively small volumes and small training sets. From the obtained results, it can be concluded that the automatic determination of the value of k can give results close to the optimal ones. Deviations are observed in the accuracy rate and the behavior of well-known KNN modifications with increasing neighborhood size for some of the training data sets tested, but one cannot expect that the same model's parameter values (e.g. for k) will be optimally applicable on all data sets.
CITATION STYLE
Mladenova, T., & Valova, I. (2023). Classification with K-Nearest Neighbors Algorithm: Comparative Analysis between the Manual and Automatic Methods for K-Selection. International Journal of Advanced Computer Science and Applications, 14(4), 396–404. https://doi.org/10.14569/IJACSA.2023.0140444
Mendeley helps you to discover research relevant for your work.