This chapter reviews diagnostic procedures for detecting outliers and influential observations in linear regression. First, the statistics for detecting single outliers and influential observations are presented, and their limitations for multiple outliers in high-leverage situations are discussed; second, diagnostic procedures designed to avoid masking are shown. We comment on the procedures by Hadi and Smirnoff [28.1,2], Atkinson [28.3] and Swallow and Kianifard [28.4] based on finding a clean subset for estimating the parameters and then increasing its size by incorporating new homogeneous observations one by one, until a heterogeneous observation is found. We also discuss procedures for detecting high-leverage outliers in large data sets based on eigenvalue analysis of the influence and sensitivity matrix, as proposed by Peña and Yohai [28.5,6]. Finally we show that the joint use of simple univariate statistics, as predictive residuals, and Cookʼs distances, jointly with the sensitivity statistic proposed by Peña [28.7] can be a useful diagnostic tool for large high-dimensional data sets.
CITATION STYLE
Peña, D. (2006). Measures of Influence and Sensitivity in Linear Regression. In Springer Handbooks (pp. 523–536). Springer. https://doi.org/10.1007/978-1-84628-288-1_28
Mendeley helps you to discover research relevant for your work.