POD: A Parallel Outlier Detection Algorithm Using Weighted kNN

Yang Ma; Xujun Zhao

Journal ArticleOPEN ACCESS

POD: A Parallel Outlier Detection Algorithm Using Weighted kNN

IEEE Access (2021) 9 81765-81777

DOI: 10.1109/ACCESS.2021.3085605

14Citations

14Readers

Abstract

Outlier detection algorithms based on k nearest neighbors (kNN) can effectively find outliers from massive data, but most algorithms are difficult to adapt to high-dimensional data sets. In order to highlight the importance of attributes in k nearest neighbors, we propose a weighted kNN query method, which uses the Z-order curve to find the kNN. The method first applies information entropy to calculate each attribute weight, and then uses the Z-order curve to encode high-dimensional data into Z-value. The weighted kNN of each object are searched according to its Z-value. Meanwhile, a novel outlier detection algorithm is presented based on the minimum distance and average distance between each object and its weighted kNN. On this basis, we propose a parallel outlier detection algorithm called POD to improve the efficiency of the outlier detection. Finally, we implement and evaluate POD algorithm on a 10-nodes Hadoop cluster, on which synthetic and UCI standard data are tested. Experimental results show that POD achieves high performance in terms of effectiveness, scalability and extensibility.

Author supplied keywords

Cite

CITATION STYLE

APA

Ma, Y., & Zhao, X. (2021). POD: A Parallel Outlier Detection Algorithm Using Weighted kNN. IEEE Access, 9, 81765–81777. https://doi.org/10.1109/ACCESS.2021.3085605

POD: A Parallel Outlier Detection Algorithm Using Weighted kNN

Abstract

Author supplied keywords

Cite

Register to see more suggestions