We have already made use of models linear in the input features, both for regression and classification. Linear regression, linear discriminant analysis, logistic regression and separating hyperplanes all rely on a linear model. It is extremely unlikely that the true function f(X) is actually linear in X. In regression problems, f(X) = E(YIX) will typically be nonlinear and nonadditive in X, and representing f (X) by a linear model is usually a convenient , and sometimes a necessary, approximation. Convenient because a linear model is easy to interpret, and is the first-order Taylor approximation to f(X). Sometimes necessary, because with N small and/or p large, a linear model might be all we are able to fit to the data without overfit-ting. Likewise in classification, a linear, Bayes-optimal decision boundary implies that some monotone transformation of Pr(Y = 11X) is linear in X. This is inevitably an approximation. In this chapter and the next we discuss popular methods for moving beyond linearity. The core idea in this chapter is to augment/replace the vector of inputs X with additional variables, which are transformations of X, and then use linear models in this new space of derived input features. Denote by hm(X) : IRP f-7 IR the mth transformation of X, m = 1, ... ,M. We then model M f(X) = L f3m hm(X), (5.1) m=l T. Hastie et al., The Elements of Statistical Learning
CITATION STYLE
Hastie, T., Friedman, J., & Tibshirani, R. (2001). Basis Expansions and Regularization (pp. 115–163). https://doi.org/10.1007/978-0-387-21606-5_5
Mendeley helps you to discover research relevant for your work.