Abstract
The number of Neighbours (k) and distance measure (DM) are widely modified for improved kNN performance. This work investigates the joint effect of these parameters in conjunction with dataset characteristics (DC) on kNN performance. Euclidean; Chebychev; Manhattan; Minkowski; and Filtered distances, eleven k values, and four DC, were systematically selected for the parameter tuning experiments. Each experiment had 20 iterations, 10-fold cross-validation method and thirty-three randomly selected datasets from the UCI repository. From the results, the average root mean squared error of kNN is significantly affected by the type of task (p<0.05, 14.53% variability effect), while DC collectively caused 74.54% change in mean RMSE values, k and DM accumulated the least effect of 25.4%. The interaction effect of tuning k, DC, and DM resulted in DM='Minkowski', 3 ≤ κ ≤ 20, 7 ≤ target dimension ≤ 9, and sample size (SS) >9000, as optimal performance pattern for classification tasks. For regression problems, the experimental configuration should be 7000 ≤ SS ≤ 9000;4≤ number of attributes ≤6, and DM = 'Filtered'. The type of task performed is the most influential kNN performance determinant, followed by DM. The variation in kNN accuracy resulting from changes in k values only occurs by chance, as it does not depict any consistent pattern, while its joint effect of k value with other parameters yielded a statistically insignificant change in mean accuracy (p>0.5). As further work, the discovered patterns would serve as the standard reference for comparative analytics of kNN performance with other classification and regression algorithms
Author supplied keywords
Cite
CITATION STYLE
Inyang, U. G., Ijebu, F. F., Osang, F. B., Afolorunso, A. A., Udoh, S. S., & Eyoh, I. J. (2023). A Dataset-Driven Parameter Tuning Approach for Enhanced K-Nearest Neighbour Algorithm Performance. International Journal on Advanced Science, Engineering and Information Technology, 13(1), 380–391. https://doi.org/10.18517/ijaseit.13.1.16706
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.