Akaike’s Information Criterion: Background, Derivation, Properties, and Refinements

  • Cavanaugh J
  • Neath A
N/ACitations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In statistics, the technique of least squares is used for estimating the unknown parameters in a linear regression model (see Linear Regression Models). is method minimizes the sum of squared distances between the observed responses in a set of data, and the etted responses from the regression model. Suppose we observe a collection of data {y i , x i } n i= on n units, where y i s are responses and x i = (x i , x i ,. .. , x ip) T is a vector of predictors. It is convenient to write the model in matrix notation, as, y = Xβ + ε, (() where y is n × vector of responses, X is n × p matrix, known as the design matrix, β = (β , β ,. .. , β p) T is the unknown parameter vector and ε is the vector of random errors. In ordinary least squares (OLS) regression, we estimate β by minimizing the residual sum of squares, RSS = (y − Xβ) T (y − Xβ), givingˆβgivingˆ givingˆβ OLS = (X T X) − X T y. is esti-mator is simple and has some good statistical properties. However, the estimator suuers from lack of uniqueness if the design matrix X is less than full rank, and if the columns of X are (nearly) collinear. To achieve better prediction and to alleviate ill conditioning problem of X T X, Hoerl and Kernard () introduced ridge regression (see Ridge and Surrogate Ridge Regressions), which minimizes the RSS subject to a constraint, ∑ β j ≤ t, in other words ˆ β ridge = argmin β N i= (y i − β − p j= x ij β j) + λ p j= β j , (() where λ ≥ is known as the complexity parameter that controls the amount of shrinkage. e larger the value of λ, the greater the amount of shrinkage. e quadratic penalty term makesˆβmakesˆ makesˆβ ridge a linear function of y. Frank and Friedman () introduced bridge regression, a generalized version of penalty (or absolute penalty type) estimation, which includes ridge regression when γ =. For a given penalty function π(⋅) and regularization parameter λ, the general form can be written as (β) = (y − Xβ) T (y − Xβ) + λπ(β), where the penalty function is of the form π(β) = p j= β j γ , γ >. (() e penalty function in (() bounds the L γ norm of the parameters in the given model as ∑ m j= β j γ ≤ t, where t is the tuning parameter that controls the amount of shrinkage. We see that for γ = , we obtain ridge regression. However, if γ ≠ , the penalty function will not be rota-tionally invariant. Interestingly, for γ < , it shrinks the coeecient toward zero, and depending on the value of λ, it sets some of them to be exactly zero. us, the procedure combines variable selection and shrinkage of coeecients of penalized regression. An important member of the penalized least squares (PLS) family is the L penalized least squares estimator or the lasso [least absolute shrinkage and selection operator, Tibshirani ()]. In other words, the absolute penalty estimator (APE) arises when the absolute value of penalty term is considered, i.e., γ = in ((). Similar to the ridge regression, the lasso estimates are obtained as ˆ β lasso

Cite

CITATION STYLE

APA

Cavanaugh, J. E., & Neath, A. A. (2011). Akaike’s Information Criterion: Background, Derivation, Properties, and Refinements. In International Encyclopedia of Statistical Science (pp. 26–29). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-04898-2_111

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free