Random Forests for Regression as a Weighted Sum of k-Potential Nearest Neighbors

19Citations
Citations of this article
25Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

In this paper, we tackle the problem of random forests for regression expressed as weighted sums of datapoints. We study the theoretical behavior of k -potential nearest neighbors ( k -PNNs) under bagging and obtain an upper bound on the weights of a datapoint for random forests with any type of splitting criterion, provided that we use unpruned trees that stop growing only when there are k or less datapoints at their leaves. Moreover, we use the previous bound together with the concept of b-terms (i.e., bootstrap terms) introduced in this paper, to derive the explicit expression of weights for datapoints in a random ( k -PNNs) selection setting, a datapoint selection strategy that we also introduce and to build a framework to derive other bagged estimators using a similar procedure. Finally, we derive from our framework the explicit expression of weights of a regression estimate equivalent to a random forest regression estimate with the random splitting criterion and demonstrate its equivalence both theoretically and practically.

Cite

CITATION STYLE

APA

Fernandez-Gonzalez, P., Bielza, C., & Larranaga, P. (2019). Random Forests for Regression as a Weighted Sum of k-Potential Nearest Neighbors. IEEE Access, 7, 25660–25672. https://doi.org/10.1109/ACCESS.2019.2900755

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free