In recent years, the advent of great technological advances has produced a wealth of very high-dimensional data, and combining high-dimensional information from multiple sources is becoming increasingly important in an extending range of scientific disciplines. Partial Least Squares Correlation (PLSC) is a frequently used method for multivariate multimodal data integration. It is, however, computationally extensive in applications involving large numbers of variables, as required, for example, in functional genomics. To handle high-dimensional problems, dimension reduction might be implemented as pre-processing step. We propose a new approach that incorporates Random Projection (RP) for dimensionality reduction into Partial Least Squares Correlation to efficiently solve high-dimensional multimodal problems like genotype-phenotype associations. We name our new method PLSC-RP. Using simulated and experimental data sets containing whole genome SNP measures as genotypes and whole brain neuroimaging measures as phenotypes, we demonstrate that PLSC-RP is drastically faster than traditional PLSC while providing statistically equivalent results. We also provide evidence that dimensionality reduction using RP is data type independent. Therefore, PLSC-RP opens up a wide range of possible applications. It can be used for any integrative analysis that combines information from multiple sources.
Grellmann, C., Neumann, J., Bitzer, S., Kovacs, P., Tönjes, A., Westlye, L. T., … Horstmann, A. (2016). Random projection for fast and efficient multivariate correlation analysis of high-dimensional data: A new approach. Frontiers in Genetics, 7(JUN). https://doi.org/10.3389/fgene.2016.00102