Assembly of non-unique insertion content using next-generation sequencing

11Citations
Citations of this article
33Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Recent studies in genomics have highlighted the significance of sequence insertions in determining individual variation. Efforts to discover the content of these sequence insertions have been limited to short insertions and long unique insertions. Much of the inserted sequence in the typical human genome, however, is a mixture of repeated and unique sequence. Current methods are designed to assemble only unique sequence insertions, using reads that do not map to the reference. These methods are not able to assemble repeated sequence insertions, as the reads will map to the reference in a different locus. In this paper, we present a computational method for discovering the content of sequence insertions that are unique, repeated, or a combination of the two. Our method analyzes the read mappings and depth of coverage of paired-end reads to identify reads that originated from inserted sequence. We demonstrate the process of assembling these reads to characterize the insertion content. Our method is based on the idea of segment extension, which progressively extends segments of known content using paired-end reads. We apply our method in simulation to discover the content of inserted sequences in a modified mouse chromosome and show that our method produces reliable results at 40x coverage. © 2011 Parrish et al; licensee BioMed Central Ltd.

Cite

CITATION STYLE

APA

Parrish, N., Hormozdiari, F., & Eskin, E. (2011). Assembly of non-unique insertion content using next-generation sequencing. BMC Bioinformatics, 12(SUPPL.6). https://doi.org/10.1186/1471-2105-12-S6-S3

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free