Abstract
Outlier detection algorithms based on k nearest neighbors (kNN) can effectively find outliers from massive data, but most algorithms are difficult to adapt to high-dimensional data sets. In order to highlight the importance of attributes in k nearest neighbors, we propose a weighted kNN query method, which uses the Z-order curve to find the kNN. The method first applies information entropy to calculate each attribute weight, and then uses the Z-order curve to encode high-dimensional data into Z-value. The weighted kNN of each object are searched according to its Z-value. Meanwhile, a novel outlier detection algorithm is presented based on the minimum distance and average distance between each object and its weighted kNN. On this basis, we propose a parallel outlier detection algorithm called POD to improve the efficiency of the outlier detection. Finally, we implement and evaluate POD algorithm on a 10-nodes Hadoop cluster, on which synthetic and UCI standard data are tested. Experimental results show that POD achieves high performance in terms of effectiveness, scalability and extensibility.
Author supplied keywords
Cite
CITATION STYLE
Ma, Y., & Zhao, X. (2021). POD: A Parallel Outlier Detection Algorithm Using Weighted kNN. IEEE Access, 9, 81765–81777. https://doi.org/10.1109/ACCESS.2021.3085605
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.