Strand-seq enables reliable separation of long reads by chromosome via expectation maximization

Maryam Ghareghani; David Porubska; Ashley D. Sanders; Sascha Meiers; Evan E. Eichler; Jan O. Korbel; Tobias Marschall

Conference ProceedingsOPEN ACCESS

Strand-seq enables reliable separation of long reads by chromosome via expectation maximization

Bioinformatics (2018) 34(13) i115-i123

DOI: 10.1093/bioinformatics/bty290

18Citations

44Readers

Abstract

Motivation: Current sequencing technologies are able to produce reads orders of magnitude longer than ever possible before. Such long reads have sparked a new interest in de novo genome assembly, which removes reference biases inherent to re-sequencing approaches and allows for a direct characterization of complex genomic variants. However, even with latest algorithmic advances, assembling a mammalian genome from long error-prone reads incurs a significant computational burden and does not preclude occasional misassemblies. Both problems could potentially be mitigated if assembly could commence for each chromosome separately. Results: To address this, we show how single-cell template strand sequencing (Strand-seq) data can be leveraged for this purpose. We introduce a novel latent variable model and a corresponding Expectation Maximization algorithm, termed SaaRclust, and demonstrates its ability to reliably cluster long reads by chromosome. For each long read, this approach produces a posterior probability distribution over all chromosomes of origin and read directionalities. In this way, it allows to assess the amount of uncertainty inherent to sparse Strand-seq data on the level of individual reads. Among the reads that our algorithm confidently assigns to a chromosome, we observed more than 99% correct assignments on a subset of Pacific Bioscience reads with 30.1×coverage. To our knowledge, SaaRclust is the first approach for the in silico separation of long reads by chromosome prior to assembly.

Cite

CITATION STYLE

APA

Ghareghani, M., Porubska, D., Sanders, A. D., Meiers, S., Eichler, E. E., Korbel, J. O., & Marschall, T. (2018). Strand-seq enables reliable separation of long reads by chromosome via expectation maximization. In Bioinformatics (Vol. 34, pp. i115–i123). Oxford University Press. https://doi.org/10.1093/bioinformatics/bty290

Strand-seq enables reliable separation of long reads by chromosome via expectation maximization

Abstract

Cite

Register to see more suggestions