The K-Nearest Neighbor classifier is a well-known and widely applied method in data mining applications. Nevertheless, its high computation and memory usage cost makes the classical K-NN not feasible for today’s Big Data analysis applications. To overcome the cost drawbacks of the known data mining methods, several distributed environment alternatives have emerged. Among these alternatives, Hadoop MapReduce distributed ecosystem attracted significant attention. Recently, several K-NN based classification algorithms have been proposed which are distributed methods tested in Hadoop environment and suitable for emerging data analysis needs. In this work, a new distributed Z-KNN algorithm is proposed, which improves the classification accuracy performance of the well-known K-Nearest Neighbor (K-NN) algorithm by benefiting from the representativeness relationship of the instances belonging to different data classes. The proposed algorithm relies on the data class representations derived from the Z data instances from each class, which are the closest to the test instance. The Z-KNN algorithm was tested in a physical Hadoop Cluster using several real-datasets belonging to different application areas. The performance results acquired after extensive experiments are presented in this paper and they prove that the proposed Z-KNN algorithm is a competitive alternative to other studies recently proposed in the literature
CITATION STYLE
TULGAR, T., HAYDAR, A., & ERŞAN, İ. (2018). A Distributed K Nearest Neighbor Classifier for Big Data. Balkan Journal of Electrical and Computer Engineering, 6(2), 105–111. https://doi.org/10.17694/bajece.419551
Mendeley helps you to discover research relevant for your work.