Sign up & Download
Sign in

Principal component analysis

by I T Jolliffe
Applied Optics ()

Abstract

Simulated event-related potential (ERP) components were used to investigate the ability of principal component analysis (PCA), Varimax rotation and univariate analysis of variance (ANOVA) to reconstruct component wave shapes, to allocate variance correctly across components, and to identify the correct locus of simulated experimental treatments. The simulated ERPs consisted of 800 randomly weighted combinations of three 64-point components, corresponding to a 2 X 2 X 10 repeated-measures design with 20 subjects. Covariance PCAs, Varimax rotations and univariate ANOVAs were performed on each of 400 such simulations, 100 with no effect of any experimental treatment and 100 each with main effects on each of the 3 components. Eight hundred additional simulations were performed to investigate the effects of systematic variations in the size of the experimental treatments and the number of subjects per experiment. The wave shapes of the simulated components were reconstructed reasonably well, although not completely, by the rotated principal component (PC) loadings. However, comparison of rotated PC scores with the random weights used to generate the simulated ERPs indicated that PCA incorrectly allocated variance across overlapping components, producing dramatic increases in type I error (the largest in excess of 80%) for ANOVAs on one component when the true treatment effect was on another. Although these results should not be overgeneralized, they clearly demonstrate that the PCA-Varimax-ANOVA strategy can incorrectly distribute variance across components, resulting in serious misinterpretation of treatment effects. Additional simulation studies are needed to determine the generality of the variance misallocation problem; pending the outcome of such studies, results obtained with the PCA-Varimax-ANOVA strategy should be interpreted cautiously.

Author-supplied keywords

Cite this document (BETA)

Available from centaur.reading.ac.uk
Page 1
hidden

Principal component analysis -

��� Agilent Technologies, Inc. 2005 sig_support@agilent.com | Main 866.744.7638 1 Principal Components Analysis Contents at a glance I. Introduction ......................................................................................................2 II. What is Principal Components Analysis?........................................................2 III. When to use Principal Components Analysis? ................................................3 IV. How to use the PCA tool? ...............................................................................3 A. PCA on Genes ......................................................................................................4 B. PCA on Conditions................................................................................................5 V. How to interpret the results?............................................................................6 A. PCA on genes .......................................................................................................6 B. PCA on conditions...............................................................................................11 VI. Technical details............................................................................................16 VII. Frequently asked questions..........................................................................16 VIII. Literature .....................................................................................................17
Page 2
hidden
��� Agilent Technologies, Inc. 2005 sig_support@agilent.com | Main 866.744.7638 2 I. Introduction When measuring only two variables, such as height and weight in a dozen patients, it is easy to plot this data and to visually assess the correlation between these two factors. However, in a typical microarray experiment, the expression of thousands of genes is measured across many conditions such as treatments or time points. Therefore, it becomes impossible to make a visual inspection of the relationship between genes or conditions in such a multi-dimensional matrix. One way to make sense of this data is to reduce its dimensionality. Several data decomposition techniques are available for this purpose: Principal Components Analysis (PCA) is among these techniques that reduces the data into two dimensions. II. What is Principal Components Analysis? Principal Components Analysis is a method that reduces data dimensionality by performing a covariance analysis between factors. As such, it is suitable for data sets in multiple dimensions, such as a large experiment in gene expression. Let���s take an example that illustrates how PCA works with a microarray experiment: Say that you measure 10,000 genes in 8 different patients. These values could form a matrix of 8 x 10,000 measurements. Now imagine that each of these 10,000 genes is plotted in a multi-dimensional on a scatter plot consisting of 8 axes, 1 for each patient. The result is a cloud of values in multi-dimensional space. To characterize the trends exhibited by this data, PCA extracts directions where the cloud is more extended. For instance, if the cloud is shaped like a football, the main direction of the data would be a midline or axis along the length of the football. This is called the first component, or the principal component. PCA will then look for the next direction, orthogonal to the first one, reducing the multidimensional cloud into a two-dimensional space. The second component would be the axis along the football width (Fig. 1).
Page 3
hidden
��� Agilent Technologies, Inc. 2005 sig_support@agilent.com | Main 866.744.7638 3 Fig 1: Football-shaped data set with two main components. In this particular example, these two components explain most of the cloud���s trends. In a more complex data set, more components might add information about interesting trends in the data. In GeneSpring, PCA can be performed based on gene expression profiles, or based on samples or conditions. III. When to use Principal Components Analysis? PCA is recommended as an exploratory tool to uncover unknown trends in the data. PCA on genes provide a way to identify predominant gene expression patterns. When applied on conditions, PCA will explore correlations between samples or conditions. Note that because the goal of PCA is to ���summarize��� the data, it is not considered a clustering tool. PCA does not attempt to group genes by user-specified criteria as does the clustering methods. IV. How to use the PCA tool? A preliminary consideration is whether to perform PCA on genes or conditions. This decision will depend on the type of experiment and type of questions you wish to answer. In most cases, only one type of PCA analysis will need to be run on your experiment. PC 2 PC 1

Readership Statistics

234 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
40% Ph.D. Student
 
12% Student (Master)
 
9% Researcher (at an Academic Institution)
by Country
 
24% United States
 
10% Germany
 
8% United Kingdom

Tags

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in