Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric k-mers

Hannes Sommer; Dilfuza Djamalova; Marco Galardini

Journal ArticleOPEN ACCESS

Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric k-mers

Microbial Genomics (2023) 9(11)

DOI: 10.1099/mgen.0.001129

0Citations

9Readers

Get full text

Abstract

The wide adoption of bacterial genome sequencing and encoding both core and accessory genome variation using k-mers has allowed bacterial genome-wide association studies (GWAS) to identify genetic variants associated with relevant phenotypes such as those linked to infection. Significant limitations still remain because of k-mers being duplicated across gene clusters and as far as the interpretation of association results is concerned, which affects the wider adoption of GWAS methods on microbial data sets. We have developed a simple computational method (panfeed) that explicitly links each k-mer to their gene cluster at base-resolution level, which allows us to avoid biases introduced by a global de Bruijn graph as well as more easily map and annotate associated variants. We tested panfeed on two independent data sets, correctly identifying previously characterized causal variants, which demonstrates the precision of the method, as well as its scalable performance. panfeed is a command line tool written in the python programming language and is available at https://github.com/microbial-pangenomes-lab/ panfeed.

Author supplied keywords

Cite

CITATION STYLE

APA

Sommer, H., Djamalova, D., & Galardini, M. (2023). Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric k-mers. Microbial Genomics, 9(11). https://doi.org/10.1099/mgen.0.001129

Reduced ambiguity and improved interpretability of bacterial genome-wide associations using gene-cluster-centric k-mers

Abstract

Author supplied keywords

Cite

Register to see more suggestions