GAML: Genome assembly by maximum likelihood

Vladimír Boža; Broňa Brejová; Tomáš Vinař

Conference Proceedings

GAML: Genome assembly by maximum likelihood

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8701 LNBI 122-134

DOI: 10.1007/978-3-662-44753-6_10

3Citations

16Readers

Get full text

Abstract

The critical part of genome assembly is resolution of repeats and scaffolding of shorter contigs. Modern assemblers usually perform this step by heuristics, often tailored to a particular technology for producing paired reads or long reads. We propose a new framework that allows systematic combination of diverse sequencing datasets into a single assembly. We achieve this by searching for an assembly with maximum likelihood in a probabilistic model capturing error rate, insert lengths, and other characteristics of each sequencing technology. We have implemented a prototype genome assembler GAML that can use any combination of insert sizes with Illumina or 454 reads, as well as PacBio reads. Our experiments show that we can assemble short genomes with N50 sizes and error rates comparable to ALLPATHS-LG or Cerulean. While ALLPATHS-LG and Cerulean require each a specific combination of datasets, GAML works on any combination. Data and software is available at http://compbio.fmph. uniba.sk/gaml © 2014 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Boža, V., Brejová, B., & Vinař, T. (2014). GAML: Genome assembly by maximum likelihood. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8701 LNBI, pp. 122–134). Springer Verlag. https://doi.org/10.1007/978-3-662-44753-6_10

GAML: Genome assembly by maximum likelihood

Abstract

Cite

Register to see more suggestions