Proovread: Large-scale high-accuracy PacBio correction through iterative short read consensus

374Citations
Citations of this article
332Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Motivation: Today, the base code of DNA is mostly determined through sequencing by synthesis as provided by the Illumina sequencers. Although highly accurate, resulting reads are short, making their analyses challenging. Recently, a new technology, single molecule real-time (SMRT) sequencing, was developed that could address these challenges, as it generates reads of several thousand bases. But, their broad application has been hampered by a high error rate. Therefore, hybrid approaches that use high-quality short reads to correct erroneous SMRT long reads have been developed. Still, current implementations have great demands on hardware, work only in well-defined computing infrastructures and reject a substantial amount of reads. This limits their usability considerably, especially in the case of large sequencing projects. Results: Here we present proovread, a hybrid correction pipeline for SMRT reads, which can be flexibly adapted on existing hardware and infrastructure from a laptop to a high-performance computing cluster. On genomic and transcriptomic test cases covering Escherichia coli, Arabidopsis thaliana and human, proovread achieved accuracies up to 99.9% and outperformed the existing hybrid correction programs. Furthermore, proovread-corrected sequences were longer and the throughput was higher. Thus, proovread combines the most accurate correction results with an excellent adaptability to the available hardware. It will therefore increase the applicability and value of SMRT sequencing.

References Powered by Scopus

The Sequence Alignment/Map format and SAMtools

41380Citations
N/AReaders
Get full text

Fast gapped-read alignment with Bowtie 2

36522Citations
N/AReaders
Get full text

Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data

3411Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Canu: Scalable and accurate long-read assembly via adaptive κ-mer weighting and repeat separation

4850Citations
N/AReaders
Get full text

Complete nitrification by Nitrospira bacteria

1976Citations
N/AReaders
Get full text

Nanopore sequencing technology, bioinformatics and applications

868Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Hackl, T., Hedrich, R., Schultz, J., & Förster, F. (2014). Proovread: Large-scale high-accuracy PacBio correction through iterative short read consensus. Bioinformatics, 30(21), 3004–3011. https://doi.org/10.1093/bioinformatics/btu392

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 150

59%

Researcher 67

26%

Professor / Associate Prof. 30

12%

Lecturer / Post doc 6

2%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 139

56%

Biochemistry, Genetics and Molecular Bi... 71

28%

Computer Science 31

12%

Engineering 9

4%

Article Metrics

Tooltip
Mentions
News Mentions: 1

Save time finding and organizing research with Mendeley

Sign up for free