Evosplit: An evolutionary approach to split a multi-label data set into disjoint subsets

1Citations
Citations of this article
9Readers
Mendeley users who have this article in their library.

Abstract

This paper presents a new evolutionary approach, EvoSplit, for the distribution of multilabel data sets into disjoint subsets for supervised machine learning. Currently, data set providers either divide a data set randomly or using iterative stratification, a method that aims to maintain the label (or label pair) distribution of the original data set into the different subsets. Following the same aim, this paper first introduces a single-objective evolutionary approach that tries to obtain a split that maximizes the similarity between those distributions independently. Second, a new multi-objective evolutionary algorithm is presented to maximize the similarity considering simultaneously both distributions (labels and label pairs). Both approaches are validated using well-known multi-label data sets as well as large image data sets currently used in computer vision and machine learning applications. EvoSplit improves the splitting of a data set in comparison to the iterative stratification following different measures: Label Distribution, Label Pair Distribution, Examples Distribution, folds and fold-label pairs with zero positive examples.

Cite

CITATION STYLE

APA

Florez-Revuelta, F. (2021). Evosplit: An evolutionary approach to split a multi-label data set into disjoint subsets. Applied Sciences (Switzerland), 11(6). https://doi.org/10.3390/app11062823

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free