A general methodology is proposed for the explanation of variability in a quantity of interest x in terms of covariates z = (z1, ., zL). It provides the conditional mean x(z) as a sum of components, where each component is represented as a product of non-parametric one-dimensional functions of each covariate zl that are computed through an alternating projection procedure. Both x and the zl can be real or categorical variables; in addition, some or all values of each zl can be unknown, providing a general framework for multi-clustering, classification and covariate imputation in the presence of confounding factors. The procedure can be considered as a preconditioning step for the more general determination of the full conditional distribution ρ(x|z) through a data-driven optimal-transport barycenter problem. In particular, just iterating the procedure once yields the second order structure (i.e. the covariance) of ρ(x|z). The methodology is illustrated through examples that include the explanation of variability of ground temperature across the continental United States and the prediction of book preference among potential readers.
Mendeley helps you to discover research relevant for your work.
CITATION STYLE
Tabak, E. G., & Trigila, G. (2018). Conditional expectation estimation through attributable components. Information and Inference, 7(4), 727–754. https://doi.org/10.1093/imaiai/iax023