T-distributed Stochastic Neighbor Embedding (t-SNE) is a data visualization tool that was developed to provide a flexible, non-parametric method for mapping high dimensional data onto a two or three dimensional subspace for data visualization. This paper observes the effects of using different intermediate data reduction algorithms (e.g., Principal Component Analysis, independent Component Analysis, Linear Discriminant Analysis, Sammon Mapping, and Local Linear Embedding) to first reduce the data to an intermediate subspace prior to applying t-SNE for visualization. Our research shows that no intermediate step in the visualization process is trivial, and application dependent knowledge should be utilized to ensure the best possible visualization in lower dimensional spaces. Experimental results are presented for several common data sets where we illustrate that, for clustering applications and visualization of class separation of multi-class data, each algorithm tested results in significantly different mappings.
CITATION STYLE
Campbell, A., Caudle, K., & Hoover, R. C. (2019). Examining intermediate data reduction algorithms for use with t-SNE. In ACM International Conference Proceeding Series (pp. 36–42). Association for Computing Machinery. https://doi.org/10.1145/3314545.3314549
Mendeley helps you to discover research relevant for your work.