Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data

  • Thomas* R
  • et al.
N/ACitations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Determining the similarity or distance among data objects is an important part in many research fields such as statistics, data mining, machine learning etc. There are many measures available in the literature to define the distance between two numerical data objects. It is difficult to define such a metric to measure the similarity between two categorical data objects since categorical data objects are not ordered. Only a few distance measures are available in the literature to find the similarities among categorical data objects. This paper presents a comparative evaluation of various similarity measures for categorical data and also introduces a novel similarity measure for categorical data based on occurrence frequency and correlation. We evaluated the performance of these similarity measures in the context of outlier detection task in data mining using real world data sets. Experimental results show that the proposed similarity measure outperform the existing similarity measures to detect outliers in categorical datasets. The performances are evaluated in the context of outlier detection task in data mining.

Cite

CITATION STYLE

APA

Thomas*, R., & Judith, J. E. (2020). Correlation and Probability Based Similarity Measure for Detecting Outliers in Categorical Data. International Journal of Innovative Technology and Exploring Engineering, 9(3), 2577–2582. https://doi.org/10.35940/ijitee.c9053.019320

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free