Abstract
Fundamentally, many problems in Machine Learning are understood as some form of function approximation; given a dataset, learn a function. However, this overlooks the ubiquitous problem of missing data. E.g., if afterwards an unseen instance has missing input variables, we actually need a function with to predict its label. Strategies to deal with missing data come in three kinds: naive, probabilistic and iterative. The naive case replaces missing values with a fixed value (e.g. the mean), then uses as if nothing was ever missing. The probabilistic case has a generative model of and uses probabilistic inference to find the most likely value of, given values for any subset of. The iterative approach consists of a loop: according to some model, fill in all the missing values based on the given ones, retrain on the completed data and redo your predictions, until these converge. MissForest is a well-known realization of this idea using Random Forests. In this work, we establish the connection between MissForest and MERCS (a multi-directional generalization of Random Forests). We go on to show that under certain (realistic) conditions where the retraining step in MissForest becomes a bottleneck, MERCS (which is trained only once) offers at-par predictive performance at a fraction of the time cost.
Author supplied keywords
Cite
CITATION STYLE
Van Wolputte, E., & Blockeel, H. (2020). Missing Value Imputation with MERCS: A Faster Alternative to MissForest. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 12323 LNAI, pp. 502–516). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-030-61527-7_33
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.