LoRDEC: Accurate and efficient long read error correction

544Citations
Citations of this article
430Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: PacBio single molecule real-time sequencing is a thirdgeneration sequencing technique producing long reads, with comparatively lower throughput and higher error rate. Errors include numerous indels and complicate downstream analysis like mapping or de novo assembly. A hybrid strategy that takes advantage of the high accuracy of second-generation short reads has been proposed for correcting long reads. Mapping of short reads on long reads provides sufficient coverage to eliminate up to 99% of errors, however, at the expense of prohibitive running times and considerable amounts of disk and memory space. Results: We present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn graph representing the short reads, and seeks a corrective sequence for each erroneous region in the long reads by traversing chosen paths in the graph. In comparison, LoRDEC is at least six times faster and requires at least 93% less memory or disk space than available tools, while achieving comparable accuracy.

Cite

CITATION STYLE

APA

Salmela, L., & Rivals, E. (2014). LoRDEC: Accurate and efficient long read error correction. Bioinformatics, 30(24), 3506–3514. https://doi.org/10.1093/bioinformatics/btu538

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free