Clustering Via Decision Tree Construction

B. Liu; Y. Xia; P.S. Yu

Book Chapter

Clustering Via Decision Tree Construction

Liu B
Xia Y
Yu P

DOI: 10.1007/11362197_5

N/ACitations

55Readers

Get full text

Abstract

Clustering is an exploratory data analysis task. It aims to find the intrinsic structure of data by organizing data objects into similarity groups or clusters. It is often called unsupervised learning because no class labels denoting an a priori partition of the objects are given. This is in contrast with supervised learning (e.g., classification) for which the data objects are already labeled with known classes. Past research in clustering has produced many algorithms. However, these algorithms have some shortcomings. In this paper, we propose a novel clustering technique, which is based on a supervised learning technique called decision tree construction. The new technique is able to overcome many of these shortcomings. The key idea is to use a decision tree to partition the data space into cluster (or dense) regions and empty (or sparse) regions (which produce outliers and anomalies). We achieve this by introducing virtual data points into the space and then applying a modified decision tree algorithm for the purpose. The technique is able to find “natural” clusters in large high dimensional spaces efficiently. It is suitable for clustering in the full dimensional space as well as in subspaces. It also provides easily comprehensible descriptions of the resulting clusters. Experiments on both synthetic data and real-life data show that the technique is effective and also scales well for large high dimensional datasets.

Cite

CITATION STYLE

APA

Liu, B., Xia, Y., & Yu, P. S. (2005). Clustering Via Decision Tree Construction (pp. 97–124). https://doi.org/10.1007/11362197_5

Clustering Via Decision Tree Construction

Abstract

Cite

Register to see more suggestions