Online Principal Component Analysis in High Dimension: Which Algorithm to Choose?

71Citations
Citations of this article
119Readers
Mendeley users who have this article in their library.

Abstract

Principal component analysis (PCA) is a method of choice for dimension reduction. In the current context of data explosion, online techniques that do not require storing all data in memory are indispensable to perform the PCA of streaming data and/or massive data. Despite the wide availability of recursive algorithms that can efficiently update the PCA when new data are observed, the literature offers little guidance on how to select a suitable algorithm for a given application. This paper reviews the main approaches to online PCA, namely, perturbation techniques, incremental methods and stochastic optimisation, and compares the most widely employed techniques in terms statistical accuracy, computation time and memory requirements using artificial and real data. Extensions of online PCA to missing data and to functional data are detailed. All studied algorithms are available in the package onlinePCA on CRAN.

Cite

CITATION STYLE

APA

Cardot, H., & Degras, D. (2018). Online Principal Component Analysis in High Dimension: Which Algorithm to Choose? International Statistical Review, 86(1), 29–50. https://doi.org/10.1111/insr.12220

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free