Random Forests for Regression as a Weighted Sum of k-Potential Nearest Neighbors

Pablo Fernandez-Gonzalez; Concepcion Bielza; Pedro Larranaga

Journal ArticleOPEN ACCESS

Random Forests for Regression as a Weighted Sum of k-Potential Nearest Neighbors

IEEE Access (2019) 7 25660-25672

DOI: 10.1109/ACCESS.2019.2900755

19Citations

25Readers

Abstract

In this paper, we tackle the problem of random forests for regression expressed as weighted sums of datapoints. We study the theoretical behavior of k -potential nearest neighbors ( k -PNNs) under bagging and obtain an upper bound on the weights of a datapoint for random forests with any type of splitting criterion, provided that we use unpruned trees that stop growing only when there are k or less datapoints at their leaves. Moreover, we use the previous bound together with the concept of b-terms (i.e., bootstrap terms) introduced in this paper, to derive the explicit expression of weights for datapoints in a random ( k -PNNs) selection setting, a datapoint selection strategy that we also introduce and to build a framework to derive other bagged estimators using a similar procedure. Finally, we derive from our framework the explicit expression of weights of a regression estimate equivalent to a random forest regression estimate with the random splitting criterion and demonstrate its equivalence both theoretically and practically.

Author supplied keywords

Cite

CITATION STYLE

APA

Fernandez-Gonzalez, P., Bielza, C., & Larranaga, P. (2019). Random Forests for Regression as a Weighted Sum of k-Potential Nearest Neighbors. IEEE Access, 7, 25660–25672. https://doi.org/10.1109/ACCESS.2019.2900755

Random Forests for Regression as a Weighted Sum of k-Potential Nearest Neighbors

Abstract

Author supplied keywords

Cite

Register to see more suggestions