To interpret or not to interpret PCA? This is our question

6Citations
Citations of this article
15Readers
Mendeley users who have this article in their library.

Abstract

Principal Component Analysis (PCA) is a central tool for analyzing data and social media data in particular. Typically, the data is projected on the first two PCs to obtain a two-dimensional view, and trends and patterns are being examined. A key to making sense of the projected data is the semantic interpretation of the new axes (the PCs). To label the PCs, one usually looks at the top k vector entries in absolute value and assigns meaning according to them. The choice of k is done by “eyeballing” the vector. In this work we provide a computational framework to support this process and suggest an interpretability score, which measures how sensitive the interpretation step could be to the choice of k. Furthermore we give a visual method to choose the optimal k. We study our methodology in four social media platforms and discover that in two of them, Twitter and Instagram, interpretation can be done in a carefree manner, but in Steam and LinkedIn there is no natural labeling of the axes. This separation is clearly reflected in the interpretability score that each dataset received.

Cite

CITATION STYLE

APA

Vilenchik, D., Yichye, B., & Abutbul, M. (2019). To interpret or not to interpret PCA? This is our question. In Proceedings of the 13th International Conference on Web and Social Media, ICWSM 2019 (pp. 655–658). Association for the Advancement of Artificial Intelligence. https://doi.org/10.1609/icwsm.v13i01.3265

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free