Clustering Via Decision Tree Construction

  • Liu B
  • Xia Y
  • Yu P
N/ACitations
Citations of this article
55Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Clustering is an exploratory data analysis task. It aims to find the intrinsic structure of data by organizing data objects into similarity groups or clusters. It is often called unsupervised learning because no class labels denoting an a priori partition of the objects are given. This is in contrast with supervised learning (e.g., classification) for which the data objects are already labeled with known classes. Past research in clustering has produced many algorithms. However, these algorithms have some shortcomings. In this paper, we propose a novel clustering technique, which is based on a supervised learning technique called decision tree construction. The new technique is able to overcome many of these shortcomings. The key idea is to use a decision tree to partition the data space into cluster (or dense) regions and empty (or sparse) regions (which produce outliers and anomalies). We achieve this by introducing virtual data points into the space and then applying a modified decision tree algorithm for the purpose. The technique is able to find “natural” clusters in large high dimensional spaces efficiently. It is suitable for clustering in the full dimensional space as well as in subspaces. It also provides easily comprehensible descriptions of the resulting clusters. Experiments on both synthetic data and real-life data show that the technique is effective and also scales well for large high dimensional datasets.

Cite

CITATION STYLE

APA

Liu, B., Xia, Y., & Yu, P. S. (2005). Clustering Via Decision Tree Construction (pp. 97–124). https://doi.org/10.1007/11362197_5

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free