A conditional probability distribution-based dissimilarity measure for categorial data

Le Si Quang; Ho Tu Bao

Conference Proceedings

A conditional probability distribution-based dissimilarity measure for categorial data

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2004) 3056 580-589

DOI: 10.1007/978-3-540-24775-3_69

4Citations

2Readers

Get full text

Abstract

Measuring the similarity between objects described by categorical attributes is a difficult task because no relations between categorical values can be mathematically specified or easily established. In the literature, most similarity (dissimilarity) measures for categorical data consider the similarity of value pairs by considering whether or not these two values are identical. In these methods, the similarity (dissimilarity) of a non-identical value pair is simply considered 0 (1). In this paper, we introduce a dissimilarity measure for categorical data by imposing association relations between non-identical value pairs of an attribute based on their relations with other attributes. The key idea is to measure the similarity between two values of a categorical attribute by the similarities of the conditional probability distributions of other attributes conditioned on these two values. Experiments with a nearest neighbor algorithm demonstrate the merits of our proposal in real-life data sets.

Author supplied keywords

Cite

CITATION STYLE

APA

Quang, L. S., & Bao, H. T. (2004). A conditional probability distribution-based dissimilarity measure for categorial data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3056, pp. 580–589). Springer Verlag. https://doi.org/10.1007/978-3-540-24775-3_69

A conditional probability distribution-based dissimilarity measure for categorial data

Abstract

Author supplied keywords

Cite

Register to see more suggestions