Silhouette-Based Evaluation of PCA, Isomap, and t-SNE on Linear and Nonlinear Data Structures

2Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Dimensionality reduction is fundamental for analyzing high-dimensional data, supporting visualization, denoising, and structure discovery. We present a systematic, large-scale benchmark of three widely used methods—Principal Component Analysis (PCA), Isometric Mapping (Isomap), and t-Distributed Stochastic Neighbor Embedding (t-SNE)—evaluated by average silhouette scores to quantify cluster preservation after embedding. Our full factorial simulation varies sample size (Formula presented.), noise variance (Formula presented.), and feature count (Formula presented.) under four generative regimes: (1) a linear Gaussian mixture, (2) a linear Student-t mixture with heavy tails, (3) a nonlinear Swiss-roll manifold, and (4) a nonlinear concentric-spheres manifold, each replicated 1000 times per condition. Beyond empirical comparisons, we provide mathematical results that explain the observed rankings: under standard separation and sampling assumptions, PCA maximizes silhouettes for linear, low-rank structure, whereas Isomap dominates on smooth curved manifolds; t-SNE prioritizes local neighborhoods, yielding strong local separation but less reliable global geometry. Empirically, PCA consistently achieves the highest silhouettes for linear structure (Isomap second, t-SNE third); on manifolds the ordering reverses (Isomap > t-SNE > PCA). Increasing (Formula presented.) and adding uninformative dimensions (larger p) degrade all methods, while larger n improves levels and stability. To our knowledge, this is the first integrated study combining a comprehensive factorial simulation across linear and nonlinear regimes with distribution-based summaries (density and violin plots) and supporting theory that predicts method orderings. The results offer clear, practice-oriented guidance: prefer PCA when structure is approximately linear; favor manifold learning—especially Isomap—when curvature is present; and use t-SNE for the exploratory visualization of local neighborhoods. Complete tables and replication materials are provided to facilitate method selection and reproducibility.

Cite

CITATION STYLE

APA

Zahed, M., & Skafyan, M. (2025). Silhouette-Based Evaluation of PCA, Isomap, and t-SNE on Linear and Nonlinear Data Structures. Stats, 8(4). https://doi.org/10.3390/stats8040105

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free