Exploring gene content with pangene graphs

16Citations
Citations of this article
31Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: The gene content regulates the biology of an organism. It varies between species and between individuals of the same species. Although tools have been developed to identify gene content changes in bacterial genomes, none is applicable to collections of large eukaryotic genomes such as the human pangenome. Results: We developed pangene, a computational tool to identify gene orientation, gene order, and gene copy-number changes in a collection of genomes. Pangene aligns a set of input protein sequences to the genomes, resolves redundancies between protein sequences and constructs a gene graph with each genome represented as a walk in the graph. It additionally finds subgraphs, which we call bibubbles, that capture gene content changes. Applied to the human pangenome, pangene identifies known gene-level variations and reveals complex haplotypes that are not well studied before. Pangene also works with high-quality bacterial pangenome and reports similar numbers of core and accessory genes in comparison to existing tools. Availability and implementation: Source code at https://github.com/lh3/pangene; prebuilt pangene graphs can be downloaded from https://zenodo.org/records/8118576 and visualized at https://pangene.bioinweb.org

Cite

CITATION STYLE

APA

Li, H., Marin, M., & Farhat, M. R. (2024). Exploring gene content with pangene graphs. Bioinformatics, 40(7). https://doi.org/10.1093/bioinformatics/btae456

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free