Multicollinearity in Regression Analysis: The Problem Revisited

Donald E. Farrar; Robert R. Glauber

Journal Article

Multicollinearity in Regression Analysis: The Problem Revisited

Farrar D
Glauber R

The Review of Economics and Statistics (1967) 49(1) 92

DOI: 10.2307/1937887

N/ACitations

956Readers

Get full text

Abstract

A point of view as well as a collection of techniques is advocated here. The techniques - in this case a series of diagnostics - can be formulated and illustrated explicitly. The spirit in which they are developed, however, is more difficult to convey. Given a point of view, techniques that support it may be replaced quite easily; the converse seldom is true. An effort will be made, therefore, to summarize our approach to multicollinearity and to contrast it with alternative views of the problem. Multicollinearity as defined here is a statistical, rather than a mathematical condition. As such, one thinks, and speaks, in terms of the problem's severity rather than of its existence or non-existence. As viewed here, multicollinearity is a property of the independent variable set alone. No account whatever is taken of the extent, or even the existence, of dependence between y and X. It is true, of course, that the effect on estimation and specification of interdependence in X - reflected by variances of estimated regression coefficient, and a tendency toward misspecification - also depends partly on the strength of dependence between y and X. In order to treat the problem, however, it is important to distinguish between nature and effects, and to develop diagnostics based on the former. In our view an independent variable set X is not less multicollinear if related to one dependent variable than if related to another, even though its effects may be more serious in one case than the other. Of multicollinearity's effects on the structural integrity of estimated econometric models - estimation instability and structural misspecification - the latter, in our view, is the more serious. Sensitivity of parameter estimates to changes in specification, sample coverage, etc., is reflected at least partially in the standard deviations of estimated regression coefficients. No indication at all exists, however, of the bias imparted to coefficient estimates by incorrectly omitting a relevant, yet multicollinear, variable from an independent variable set. Historical approaches to multicollinearity are almost unanimous in presuming the problem's solution to lie in deciding which variables to keep and which to drop from a model. That the gap between a model's informational requirements and a set of data's informational content can be reduced by increasing available information, as well as by reducing model complexity, is all too seldom considered. 33 A major aim of the present approach, on the other hand, is to provide sufficiently detailed insight into the location and pattern of interdependence within a set of independent variables that strategic additions of information become not only a theoretical possibility, but also a practically feasible course of action. Selectivity, however, is emphasized. This is not a counsel of perfection. The purpose of regression analysis is to estimate the structure of a dependent variable, y's, dependence on a pre-selected set of independent variables X, not to select an orthogonal independent variable set. Structural integrity over an entire set, admittedly, requires both complete specification and internal orthogonality. One cannot obtain reliable estimates for an entire n-dimensional structure, or distinguish between competing n-dimensional hypotheses, with fewer than n significant dimensions of independent variation. Yet all variables are seldom equally important. Only one or at most two or three - strategically important variables are ordinarily present in a regression equation. With complete specification and detailed insight into the location and pattern of interdependence in X, structural instability within the critical subset can be evaluated and, if necessary, corrected. Multicollinearity among non-critical variables can be tolerated. Should critical variables also be affected, additional information to provide coefficient estimates either for the essential variables directly, or for those members of the set on which they are principally dependent, is required. Detailed diagnostics for the pattern of interdependence that undermines the experimental quality of X permits such information to be developed and applied both frugally and effectively. Insight into the pattern of interdependence that affects an independent variable set can be provided in many ways. The entire field of factor analysis, for example, is designed to handle such problems. 34 Advantages of the measures proposed here are two-fold. The first advantage is pragmatic: While factor analysis involves extensive separate computations, the present set of measures relies entirely on transformations of statistics, such as the determinant | X t X | and elements of the inverse correlation matrix, (X t X) -1, that are generated routinely during standard regression computations. The second is that of symmetry: Questions of dependence and interdependence in regression analysis are handled in the same conceptual and statistical framework. Variables that are internal to a set X for one purpose are viewed as external to a subset of it for another. In this vein, tests of interdependence are approached as successive tests of each independent variable's dependence on other members of the set. Persons of Bayesian bent who object to the use of probability levels in any form - either as measures of sample properties or as a basis for inference about population characteristics - may prefer mathematical to statistical measures of interdependence. Accordingly, determinants and multiple correlation and partial correlation coefficients are available. Their use is most defensible where extremely large sample sizes insure extremely small probability levels. Probabilistic measures such as Chi Square, F, and t transformations, on the other hand, may be preferred by others - especially where small samples increase a measure's sensitivity to available degrees of freedom. In either case, the conceptual and computational apparatus of regression analysis may be used to provide a quick and simple, yet serviceable, substitute for the factor analysis of an independent vari- able set. It would be pleasant to conclude on a note of triumph that the problem has been solved and that no further "revisits" are necessary. Such a feeling, clearly, would be misleading. Diagnosis, although a necessary first step, does not insure cure. No miraculous "instant orthogonalization" can be offered. We do, however, close on a note of optimism. The diagnostics described here offer the econometrician a place to begin. In combination with a spirit of selectivity in obtaining and applying additional information, multicollinearity may return from the realm of impossible to that of difficult, yet tractable, econometric problems. [ABSTRACT FROM AUTHOR] Copyright of Review of Economics & Statistics is the property of MIT Press and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)

Cite

CITATION STYLE

APA

Farrar, D. E., & Glauber, R. R. (1967). Multicollinearity in Regression Analysis: The Problem Revisited. The Review of Economics and Statistics, 49(1), 92. https://doi.org/10.2307/1937887

Multicollinearity in Regression Analysis: The Problem Revisited

Abstract

Cite

Register to see more suggestions