ConDist: A context-driven categorical distance measure

7Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

A distance measure between objects is a key requirement for many data mining tasks like clustering, classification or outlier detection. However, for objects characterized by categorical attributes, defining meaningful distance measures is a challenging task since the values within such attributes have no inherent order, especially without additional domain knowledge. In this paper, we propose an unsupervised distance measure for objects with categorical attributes based on the idea that categorical attribute values are similar if they appear with similar value distributions on correlated context attributes. Thus, the distance measure is automatically derived from the given data set. We compare our new distance measure to existing categorical distance measures and evaluate on different data sets from the UCI machine-learning repository. The experiments show that our distance measure is recommendable, since it achieves similar or better results in a more robust way than previous approaches.

Cite

CITATION STYLE

APA

Ring, M., Otto, F., Becker, M., Niebler, T., Landes, D., & Hotho, A. (2015). ConDist: A context-driven categorical distance measure. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9284, pp. 251–266). Springer Verlag. https://doi.org/10.1007/978-3-319-23528-8_16

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free