High-Dimensional Outlier Detection: The Subspace Method

  • Aggarwal C
N/ACitations
Citations of this article
42Readers
Mendeley users who have this article in their library.
Get full text

Abstract

" In view of all that we have said in the foregoing sections, the many obstacles we appear to have surmounted, what casts the pall over our victory celebration? It is the curse of dimensionality, a malediction that has plagued the scientist from the earliest days. " – Richard Bellman 5.1 Introduction Many real data sets are very high dimensional. In some scenarios, real data sets may contain hundreds or thousands of dimensions. With increasing dimensionality, many of the conven-tional outlier detection methods do not work very effectively. This is an artifact of the well-known curse of dimensionality. In high-dimensional space, the data becomes sparse, and the true outliers become masked by the noise effects of multiple irrelevant dimensions, when analyzed in full dimensionality. A main cause of the dimensionality curse is the difficulty in defining the relevant local-ity of a point in the high-dimensional case. For example, proximity-based methods define locality with the use of distance functions on all the dimensions. On the other hand, all the dimensions may not be relevant for a specific test point, which also affects the quality of the underlying distance functions [263]. For example, all pairs of points are almost equidistant in high-dimensional space. This phenomenon is referred to as data sparsity or distance con-centration. Since outliers are defined as data points in sparse regions, this results in a poorly discriminative situation where all data points are situated in almost equally sparse regions in full dimensionality. The challenges arising from the dimensionality curse are not specific to outlier detection. It is well known that many problems such as clustering and similarity search experience qualitative challenges with increasing dimensionality [5, 7, 121, 263]. In fact, it has been suggested that almost any algorithm that is based on the notion of prox-imity would degrade qualitatively in higher-dimensional space, and would therefore need to

Cite

CITATION STYLE

APA

Aggarwal, C. C. (2017). High-Dimensional Outlier Detection: The Subspace Method. In Outlier Analysis (pp. 149–184). Springer International Publishing. https://doi.org/10.1007/978-3-319-47578-3_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free