A spectral clustering-based dataset structure analysis and outlier detection progress

Lin Hai; Zhu Qingsheng

Conference Proceedings

A spectral clustering-based dataset structure analysis and outlier detection progress

Lecture Notes in Electrical Engineering (2012) 154 LNEE 699-708

DOI: 10.1007/978-1-4471-2386-6_91

0Citations

5Readers

Get full text

Abstract

Data mining is a process of extracting valid, previously unknown, and ultimately comprehensible information from large datasets and using it for organizational decision-making. Clustering is one of the most useful tasks in data mining process for discovering groups and identifying interesting distributions and patterns in the underlying data.In the clustering process, there are no predefined classes and no examples that would show what kind of desirable relations should be valid among the data that is why it is perceived as an unsupervised process. On the other hand, classification is a procedure of assigning a data item to a predefined set of categories. Clustering produces initial categories in which values of a data set are classified during the classification process. The clustering process may result in different partitioning of a data set, depending on the specific criterion used for clustering. In general terms, the clustering algorithms are based on a criterion for assessing the quality of a given partitioning. They take some parameters (e.g. number of clusters, density of clusters)as input and attempt to findthe best partitioning of a dataset for the given parameters. Thus, they define a partitioning of a data set based on certain assumptions and not necessarily the "best" one that fits the data set. Since clustering algorithms discover clusters, which are not known a priori, the final partition of a data set requires some sort of evaluationin most applications, clustering results validation are discussed in the literature. They aim at the quantitative evaluation of the results of the clustering algorithms and are known under the general term cluster validity methods. Many data mining algorithms consider outliers as noise that must be eliminated because they degrade their predictive accuracy. However, as pointed out, "one person's noise could be another person's signal". Outlier mining can be used in telecom or credit card frauds to detect the atypical usage of telecom services or credit cards, in medical analysis to test abnormal reactions to new medical therapies, and in marketing and customer segmentations to identify customers spending much more or much less than the average customer. © 2012 Springer-Verlag London Limited.

Cite

CITATION STYLE

APA

Hai, L., & Qingsheng, Z. (2012). A spectral clustering-based dataset structure analysis and outlier detection progress. In Lecture Notes in Electrical Engineering (Vol. 154 LNEE, pp. 699–708). https://doi.org/10.1007/978-1-4471-2386-6_91

A spectral clustering-based dataset structure analysis and outlier detection progress

Abstract

Cite

Register to see more suggestions