A Novel Ensemble Method for Detecting Outliers in Categorical Data

  • Thomas R
N/ACitations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

© 2020, World Academy of Research in Science and Engineering. All rights reserved. Outliers are data objects having very important and valuable information, but are rare in their datasets. Several algorithms are developed by various researchers for finding outliers from different types of datasets like multivariate datasets, time series datasets, image datasets and high dimensional datasets. These algorithms are specific to the type of datasets and no general purpose algorithm for detecting outliers in different types of datasets exists in the literature. Moreover most of the algorithms in the literature are capable of dealing with numerical data only. Real world datasets may contain data objects with categorical features in addition to numerical data objects. Here, we propose a novel ensemble learning method for finding outliers in categorical datasets, that ensemble one hot encoder and label encoder together with different outlier detection algorithms such as Local Outlier Factor, One ClassSupport Vector Machine, Elliptic Envelope, Isolation Forest and k-Nearest Neighbor. Experimental results using real world datasets show that the proposed ensemble method for finding outliers in categorical datasets outperforms the other outlier detection techniques.

Cite

CITATION STYLE

APA

Thomas, R. (2020). A Novel Ensemble Method for Detecting Outliers in Categorical Data. International Journal of Advanced Trends in Computer Science and Engineering, 9(4), 4947–4953. https://doi.org/10.30534/ijatcse/2020/108942020

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free