The classical multivariate 2 sample significance test based on Hotelling's T2T2T^2 is undefined when the number kkk of variables exceeds the number of within sample degrees of freedom available for estimation of variances and covariances. Addition of an a priori Euclidean metric to the affine kkk-space assumed by the classical method leads to an alternative approach to the same problem. A test statistic FFF which is the ratio of 2 mean square distances is proposed and 3 methods of attaching a significance level to FFF are described. The third method is considered in detail and leads to a "non-exact" significance test where the null hypothesis distribution of FFF depends, in approximation, on a single unknown parameter rrr for which an estimate must be substituted. Approximate distribution theory leads to 2 independent estimates of rrr based on nearly sufficient statistics and these may be combined to yield a single estimate. A test of FFF nominally at the 5% level but based on an estimate of rrr rather than rrr itself has a true significance level which is a function of rrr. This function is investigated and shown to be quite near 5%. The sensitivity of the test to a parameter measuring statistical distance between population means is discussed and it is shown that arbitrarily small differences in each individual variable can result in a detectable overall difference provided the number of variables (or, more precisely, rrr) can be made sufficiently large. This sensitivity discussion has stated implications for the a priori choice of metric in kkk-space. Finally a geometrical description of the case of large rrr is presented.
CITATION STYLE
Dempster, A. P. (1958). A High Dimensional Two Sample Significance Test. The Annals of Mathematical Statistics, 29(4), 995–1010. https://doi.org/10.1214/aoms/1177706437
Mendeley helps you to discover research relevant for your work.