Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

undefined; undefined; Roy Thomas*; J.E. Judith

Journal Article

Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

Thomas* R
et al.

International Journal of Innovative Technology and Exploring Engineering (2020) 9(3) 2577-2582

DOI: 10.35940/ijitee.c9053.019320

N/ACitations

3Readers

Get full text

Abstract

Determining the similarity or distance among data objects is an important part in many research fields such as statistics, data mining, machine learning etc. There are many measures available in the literature to define the distance between two numerical data objects. It is difficult to define such a metric to measure the similarity between two categorical data objects since categorical data objects are not ordered. Only a few distance measures are available in the literature to find the similarities among categorical data objects. This paper presents a comparative evaluation of various similarity measures for categorical data and also introduces a novel similarity measure for categorical data based on occurrence frequency and correlation. We evaluated the performance of these similarity measures in the context of outlier detection task in data mining using real world data sets. Experimental results show that the proposed similarity measure outperform the existing similarity measures to detect outliers in categorical datasets. The performances are evaluated in the context of outlier detection task in data mining.

Cite

CITATION STYLE

APA

Thomas*, R., & Judith, J. E. (2020). Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data. International Journal of Innovative Technology and Exploring Engineering, 9(3), 2577–2582. https://doi.org/10.35940/ijitee.c9053.019320

Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

Abstract

Cite

Register to see more suggestions