Abstract
Advances in next-generation sequencing have provided high-dimensional RNA-seq datasets, allowing the stratification of some tumor patients based on their transcriptomic profiles. Machine learning methods have been used to reduce and cluster high-dimensional data. Recently, uniform manifold approximation and projection (UMAP) was applied to project genomic datasets in low-dimensional Euclidean latent space. Here, we evaluated how different representations of the UMAP embedding can impact the analysis of breast cancer (BC) stratification. We projected BC RNA-seq data on Euclidean, spherical, and hyperbolic spaces, and stratified BC patients via clustering algorithms. We also proposed a pipeline to yield more reproducible clustering outputs. The results show how the selection of the latent space can affect downstream stratification results and suggest that the exploration of different geometrical representations is recommended to explore data structure and samples’ relationships.
Author supplied keywords
Cite
CITATION STYLE
Bollon, J., Assale, M., Cina, A., Marangoni, S., Calabrese, M., Salvemini, C. B., … Cavalli, A. (2022). Investigating How Reproducibility and Geometrical Representation in UMAP Dimensionality Reduction Impact the Stratification of Breast Cancer Tumors. Applied Sciences (Switzerland), 12(9). https://doi.org/10.3390/app12094247
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.