Principal component analysis -
��� Agilent Technologies, Inc. 2005 sig_support@agilent.com | Main 866.744.7638 1 Principal Components Analysis Contents at a glance I. Introduction ......................................................................................................2 II. What is Principal Components Analysis?........................................................2 III. When to use Principal Components Analysis? ................................................3 IV. How to use the PCA tool? ...............................................................................3 A. PCA on Genes ......................................................................................................4 B. PCA on Conditions................................................................................................5 V. How to interpret the results?............................................................................6 A. PCA on genes .......................................................................................................6 B. PCA on conditions...............................................................................................11 VI. Technical details............................................................................................16 VII. Frequently asked questions..........................................................................16 VIII. Literature .....................................................................................................17
��� Agilent Technologies, Inc. 2005 sig_support@agilent.com | Main 866.744.7638 2 I. Introduction When measuring only two variables, such as height and weight in a dozen patients, it is easy to plot this data and to visually assess the correlation between these two factors. However, in a typical microarray experiment, the expression of thousands of genes is measured across many conditions such as treatments or time points. Therefore, it becomes impossible to make a visual inspection of the relationship between genes or conditions in such a multi-dimensional matrix. One way to make sense of this data is to reduce its dimensionality. Several data decomposition techniques are available for this purpose: Principal Components Analysis (PCA) is among these techniques that reduces the data into two dimensions. II. What is Principal Components Analysis? Principal Components Analysis is a method that reduces data dimensionality by performing a covariance analysis between factors. As such, it is suitable for data sets in multiple dimensions, such as a large experiment in gene expression. Let���s take an example that illustrates how PCA works with a microarray experiment: Say that you measure 10,000 genes in 8 different patients. These values could form a matrix of 8 x 10,000 measurements. Now imagine that each of these 10,000 genes is plotted in a multi-dimensional on a scatter plot consisting of 8 axes, 1 for each patient. The result is a cloud of values in multi-dimensional space. To characterize the trends exhibited by this data, PCA extracts directions where the cloud is more extended. For instance, if the cloud is shaped like a football, the main direction of the data would be a midline or axis along the length of the football. This is called the first component, or the principal component. PCA will then look for the next direction, orthogonal to the first one, reducing the multidimensional cloud into a two-dimensional space. The second component would be the axis along the football width (Fig. 1).
��� Agilent Technologies, Inc. 2005 sig_support@agilent.com | Main 866.744.7638 3 Fig 1: Football-shaped data set with two main components. In this particular example, these two components explain most of the cloud���s trends. In a more complex data set, more components might add information about interesting trends in the data. In GeneSpring, PCA can be performed based on gene expression profiles, or based on samples or conditions. III. When to use Principal Components Analysis? PCA is recommended as an exploratory tool to uncover unknown trends in the data. PCA on genes provide a way to identify predominant gene expression patterns. When applied on conditions, PCA will explore correlations between samples or conditions. Note that because the goal of PCA is to ���summarize��� the data, it is not considered a clustering tool. PCA does not attempt to group genes by user-specified criteria as does the clustering methods. IV. How to use the PCA tool? A preliminary consideration is whether to perform PCA on genes or conditions. This decision will depend on the type of experiment and type of questions you wish to answer. In most cases, only one type of PCA analysis will need to be run on your experiment. PC 2 PC 1