When analyzing high-dimensional input/output systems, it is common to perform sensitivity analysis to identify important variables and reduce the complexity and computational cost of the problem. In order to perform sensitivity analysis on fixed data sets, i.e. without the possibility of further sampling, we fit a surrogate model to the data. This paper explores the effects of model error on sensitivity analysis, using Sobol' indices (SI), a measure of the variance contributed by particular variables (first order indices) and by interactions between multiple variables (total indices), as the primary measure of variable importance. We also examine partial derivative measures of sensitivity. All analysis is based on data generated by various test functions for which the true SI are known. We fit two non-parametric models, Multivariate Adaptive Regression Splines (MARS) and Random Forest, to the test data, and the SI are approximated using R routines. An analytic solution for SI based on the MARS basis functions is derived and compared to the actual and approximated SI. Further, we apply MARS and Random Forest to data sets of increasing size to explore convergence of error as available data increases. Due to efficiency constraints in the surrogate models, constant relative error is quickly reached and maintained despite increasing size of data. We find that variable importance and SI are well approximated, even in cases where there is significant error in the surrogate model.
CITATION STYLE
Jutz, K. (2016). Accuracy of data-based sensitivity indices. SIAM Undergraduate Research Online, 9. https://doi.org/10.1137/15s014757
Mendeley helps you to discover research relevant for your work.