Comparing De Novo genome assembly: The long and short of it

80Citations
Citations of this article
622Readers
Mendeley users who have this article in their library.

Abstract

Recent advances in DNA sequencing technology and their focal role in Genome Wide Association Studies (GWAS) have rekindled a growing interest in the whole-genome sequence assembly (WGSA) problem, thereby, inundating the field with a plethora of new formalizations, algorithms, heuristics and implementations. And yet, scant attention has been paid to comparative assessments of these assemblers' quality and accuracy. No commonly accepted and standardized method for comparison exists yet. Even worse, widely used metrics to compare the assembled sequences emphasize only size, poorly capturing the contig quality and accuracy. This paper addresses these concerns: it highlights common anomalies in assembly accuracy through a rigorous study of several assemblers, compared under both standard metrics (N50, coverage, contig sizes, etc.) as well as a more comprehensive metric (Feature-Response Curves, FRC) that is introduced here; FRC transparently captures the trade-offs between contigs' quality against their sizes. For this purpose, most of the publicly available major sequence assemblers - both for low-coverage long (Sanger) and high-coverage short (Illumina) reads technologies - are compared. These assemblers are applied to microbial (Escherichia coli, Brucella, Wolbachia, Staphylococcus, Helicobacter) and partial human genome sequences (Chr. Y), using sequence reads of various read-lengths, coverages, accuracies, and with and without mate-pairs. It is hoped that, based on these evaluations, computational biologists will identify innovative sequence assembly paradigms, bioinformaticists will determine promising approaches for developing "next-generation" assemblers, and biotechnologists will formulate more meaningful design desiderata for sequencing technology platforms. A new software tool for computing the FRC metric has been developed and is available through the AMOS open-source consortium. © 2011 Narzisi and Mishra.

References Powered by Scopus

DNA sequencing with chain-terminating inhibitors.

59245Citations
N/AReaders
Get full text

Initial sequencing and analysis of the human genome

19214Citations
N/AReaders
Get full text

Velvet: Algorithms for de novo short read assembly using de Bruijn graphs

7849Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Assemblathon 2: Evaluating de novo methods of genome assembly in three vertebrate species

488Citations
N/AReaders
Get full text

Assemblathon 1: A competitive assessment of de novo short read assembly methods

374Citations
N/AReaders
Get full text

Sequence assembly demystified

336Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Narzisi, G., & Mishra, B. (2011). Comparing De Novo genome assembly: The long and short of it. PLoS ONE, 6(4). https://doi.org/10.1371/journal.pone.0019175

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 280

56%

Researcher 160

32%

Professor / Associate Prof. 45

9%

Lecturer / Post doc 11

2%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 364

72%

Biochemistry, Genetics and Molecular Bi... 84

17%

Computer Science 49

10%

Engineering 12

2%

Save time finding and organizing research with Mendeley

Sign up for free