Rapid, paralog-sensitive CNV analysis of 2457 human genomes using quick-mer2

17Citations
Citations of this article
16Readers
Mendeley users who have this article in their library.

Abstract

Gene duplication is a major mechanism for the evolution of gene novelty, and copynumber variation makes a major contribution to inter-individual genetic diversity. However, most approaches for studying copy-number variation rely upon uniquely mapping reads to a genome reference and are unable to distinguish among duplicated sequences. Specialized approaches to interrogate specific paralogs are comparatively slow and have a high degree of computational complexity, limiting their effective application to emerging population-scale data sets. We present QuicK-mer2, a self-contained, mapping-free approach that enables the rapid construction of paralog-specific copy-number maps from short-read sequence data. This approach is based on the tabulation of unique k-mer sequences from short-read data sets, and is able to analyze a 20X coverage human genome in approximately 20 min. We applied our approach to newly released sequence data from the 1000 Genomes Project, constructed paralog-specific copy-number maps from 2457 unrelated individuals, and uncovered copy-number variation of paralogous genes. We identify nine genes where none of the analyzed samples have a copy number of two, 92 genes where the majority of samples have a copy number other than two, and describe rare copy number variation effecting multiple genes at the APOBEC3 locus.

Cite

CITATION STYLE

APA

Shen, F., & Kidd, J. M. (2020). Rapid, paralog-sensitive CNV analysis of 2457 human genomes using quick-mer2. Genes, 11(2). https://doi.org/10.3390/genes11020141

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free