excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies

9Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Summary: Exclusion regions are sections of reference genomes with abnormal pileups of short sequencing reads. Removing reads overlapping them improves biological signal, and these benefits are most pronounced in differential analysis settings. Several labs created exclusion region sets, available primarily through ENCODE and Github. However, the variety of exclusion sets creates uncertainty which sets to use. Furthermore, gap regions (e.g. centromeres, telomeres, short arms) create additional considerations in generating exclusion sets. We generated exclusion sets for the latest human T2T-CHM13 and mouse GRCm39 genomes and systematically assembled and annotated these and other sets in the excluderanges R/Bioconductor data package, also accessible via the BEDbase.org API. The package provides unified access to 82 GenomicRanges objects covering six organisms, multiple genome assemblies, and types of exclusion regions. For human hg38 genome assembly, we recommend hg38.Kundaje.GRCh38_unified_blacklist as the most well-curated and annotated, and sets generated by the Blacklist tool for other organisms.

Cite

CITATION STYLE

APA

Ogata, J. D., Mu, W., Davis, E. S., Xue, B., Harrell, J. C., Sheffield, N. C., … Dozmorov, M. G. (2023). excluderanges: exclusion sets for T2T-CHM13, GRCm39, and other genome assemblies. Bioinformatics, 39(4). https://doi.org/10.1093/bioinformatics/btad198

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free