In this paper we propose a new definition of distance-based outlier that considers for each point the sum of the distances from its κ nearest neighbors, called weight. Outliers are those points having the largest values of weight. In order to compute these weights, we find the κ nearest neighbors of each point in a fast and efficient way by linearizing the search space through the Hilbert space filling curve. The algorithm consists of two phases, the first provides an approximated solution, within a small factor, after executing at most d + 1 scans of the data set with a low time complexity cost, where d is the number of dimensions of the data set. During each scan the number of points candidate to belong to the solution set is sensibly reduced. The second phase returns the exact solution by doing a single scan which examines further a little fraction of the data set. Experimental results show that the algorithm always finds the exact solution during the first phase after d̄≪d + 1 steps and it scales linearly both in the dimensionality and the size of the data set. © 2002 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Angiulli, F., & Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2431 LNAI, pp. 15–27). Springer Verlag. https://doi.org/10.1007/3-540-45681-3_2
Mendeley helps you to discover research relevant for your work.