Evolving probabilistically significant epistatic classification rules for heterogeneous big datasets

3Citations
Citations of this article
14Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We develop an algorithm to evolve sets of probabilistically significant multivariate feature interactions, with co-evolved feature ranges, for classification in large, complex datasets. The datasets may include nominal, ordinal, and/or continuous features, missing data, imbalanced classes, and other complexities. Our age-layered evolutionary algorithm generates conjunctive clauses to model multivariate interactions in datasets that are too large to be analyzed using traditional methods such as logistic regression. Using a novel hypergeometric probability mass function for fitness evaluation, the algorithm automatically archives conjunctive clauses that are probabilistically significant at a given threshold, thus identifying strong complex multivariate interactions. The method is validated on two synthetic epistatic datasets and applied to a complex real-world survey dataset aimed at determining the drivers of household infestation for an insect that transmits Chagas disease. We identify a set of 178,719 predictive feature interactions that are associated with household infestation, thus dramatically reducing the size of the search space for future analysis.

Cite

CITATION STYLE

APA

Hanley, J. P., Eppstein, M. J., Buzas, J. S., & Rizzo, D. M. (2016). Evolving probabilistically significant epistatic classification rules for heterogeneous big datasets. In GECCO 2016 - Proceedings of the 2016 Genetic and Evolutionary Computation Conference (pp. 445–452). Association for Computing Machinery, Inc. https://doi.org/10.1145/2908812.2908931

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free