Removing Bias from Diverse Data Clusters for Ensemble Classification

10Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Diversity plays an important role in successful ensemble classification. One way to diversify the base-classifiers in an ensemble classifier is to diversify the data they are trained on. Sampling techniques such as bagging have been used for this task in the past, however we argue that since they maintain the global distribution, they do not engender diversity. We instead make a principled argument for the use of k-Means clustering to create diversity. When creating multiple clusterings with multiple k values, there is a risk of different clusterings discovering the same clusters, which would then train the same base-classifiers. This would bias the ensemble voting process. We propose a new approach that uses the Jaccard Index to detect and remove similar clusters before training the base-classifiers, reducing classification error by removing repeated votes. We demonstrate the effectiveness of our proposed approach by comparing it to three state-of-the-art ensemble algorithms on eight UCI datasets.

Cite

CITATION STYLE

APA

Fletcher, S., & Verma, B. (2017). Removing Bias from Diverse Data Clusters for Ensemble Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10637 LNCS, pp. 140–149). Springer Verlag. https://doi.org/10.1007/978-3-319-70093-9_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free