Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination

15Citations
Citations of this article
35Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Although it is assumed that contamination in bacterial whole-genome sequencing causes errors, the influences of contamination on clustering analyses, such as single-nucleotide polymorphism discovery, phylogenetics, and multi-locus sequencing typing, have not been quantified. By developing and analyzing 720 Listeria monocytogenes, Salmonella enterica, and Escherichia coli short-read datasets, we demonstrate that within-species contamination causes errors that confound clustering analyses, while between-species contamination generally does not. Contaminant reads mapping to references or becoming incorporated into chimeric sequences during assembly are the sources of those errors. Contamination sufficient to influence clustering analyses is present in public sequence databases.

Cite

CITATION STYLE

APA

Pightling, A. W., Pettengill, J. B., Wang, Y., Rand, H., & Strain, E. (2019). Within-species contamination of bacterial whole-genome sequence data has a greater influence on clustering analyses than between-species contamination. Genome Biology, 20(1). https://doi.org/10.1186/s13059-019-1914-x

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free