Fast outlier detection in high dimensional spaces

642Citations
Citations of this article
280Readers
Mendeley users who have this article in their library.

Abstract

In this paper we propose a new definition of distance-based outlier that considers for each point the sum of the distances from its κ nearest neighbors, called weight. Outliers are those points having the largest values of weight. In order to compute these weights, we find the κ nearest neighbors of each point in a fast and efficient way by linearizing the search space through the Hilbert space filling curve. The algorithm consists of two phases, the first provides an approximated solution, within a small factor, after executing at most d + 1 scans of the data set with a low time complexity cost, where d is the number of dimensions of the data set. During each scan the number of points candidate to belong to the solution set is sensibly reduced. The second phase returns the exact solution by doing a single scan which examines further a little fraction of the data set. Experimental results show that the algorithm always finds the exact solution during the first phase after d̄≪d + 1 steps and it scales linearly both in the dimensionality and the size of the data set. © 2002 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Angiulli, F., & Pizzuti, C. (2002). Fast outlier detection in high dimensional spaces. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2431 LNAI, pp. 15–27). Springer Verlag. https://doi.org/10.1007/3-540-45681-3_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free