Visualizing population structure with variational autoencoders

C. J. Battey; Gabrielle C. Coffing; Andrew D. Kern

Journal ArticleOPEN ACCESS

Visualizing population structure with variational autoencoders

G3: Genes, Genomes, Genetics (2021) 11(1)

DOI: 10.1093/G3JOURNAL/JKAA036

50Citations

135Readers

Abstract

Dimensionality reduction is a common tool for visualization and inference of population structure from genotypes, but popular methods either return too many dimensions for easy plotting (PCA) or fail to preserve global geometry (t-SNE and UMAP). Here we explore the utility of variational autoencoders (VAEs)-generative machine learning models in which a pair of neural networks seek to first compress and then recreate the input data-for visualizing population genetic variation. VAEs incorporate nonlinear relationships, allow users to define the dimensionality of the latent space, and in our tests preserve global geometry better than t-SNE and UMAP. Our implementation, which we call popvae, is available as a command-line python program at github.com/kr-colab/popvae. The approach yields latent embeddings that capture subtle aspects of population structure in humans and Anopheles mosquitoes, and can generate artificial genotypes characteristic of a given sample or population.

Author supplied keywords

Cite

CITATION STYLE

APA

Battey, C. J., Coffing, G. C., & Kern, A. D. (2021). Visualizing population structure with variational autoencoders. G3: Genes, Genomes, Genetics, 11(1). https://doi.org/10.1093/G3JOURNAL/JKAA036

Visualizing population structure with variational autoencoders

Abstract

Author supplied keywords

Cite

Register to see more suggestions