Removing Bias from Diverse Data Clusters for Ensemble Classification

Sam Fletcher; Brijesh Verma

Conference Proceedings

Removing Bias from Diverse Data Clusters for Ensemble Classification

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2017) 10637 LNCS 140-149

DOI: 10.1007/978-3-319-70093-9_15

10Citations

8Readers

Get full text

Abstract

Diversity plays an important role in successful ensemble classification. One way to diversify the base-classifiers in an ensemble classifier is to diversify the data they are trained on. Sampling techniques such as bagging have been used for this task in the past, however we argue that since they maintain the global distribution, they do not engender diversity. We instead make a principled argument for the use of k-Means clustering to create diversity. When creating multiple clusterings with multiple k values, there is a risk of different clusterings discovering the same clusters, which would then train the same base-classifiers. This would bias the ensemble voting process. We propose a new approach that uses the Jaccard Index to detect and remove similar clusters before training the base-classifiers, reducing classification error by removing repeated votes. We demonstrate the effectiveness of our proposed approach by comparing it to three state-of-the-art ensemble algorithms on eight UCI datasets.

Author supplied keywords

Cite

CITATION STYLE

APA

Fletcher, S., & Verma, B. (2017). Removing Bias from Diverse Data Clusters for Ensemble Classification. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 10637 LNCS, pp. 140–149). Springer Verlag. https://doi.org/10.1007/978-3-319-70093-9_15

Removing Bias from Diverse Data Clusters for Ensemble Classification

Abstract

Author supplied keywords

Cite

Register to see more suggestions