Using QC-blind for quality control and contamination screening of bacteria DNA sequencing data without reference genome

10Citations
Citations of this article
18Readers
Mendeley users who have this article in their library.

Abstract

Quality control for next generation sequencing (NGS) has become increasingly important with the ever increasing importance of sequencing data for omics studies. Tools have been developed for filtering possible contaminants from species with known reference genome. Unfortunately, reference genomes for all the species involved, including the contaminants, are required for these tools to work. This precludes many real-life samples that have no information about the complete genome of the target species, and are contaminated with unknown microbial species. In this work we proposed QC-Blind, a novel quality control pipeline for removing contaminants without any use of reference genomes. The pipeline merely requires the information about a few marker genes of the target species. The entire pipeline consists of unsupervised read assembly, contig binning, read clustering, and marker gene assignment. When evaluated on in silico, ab initio and in vivo datasets, QC-Blind proved effective in removing unknown contaminants with high specificity and accuracy, while preserving most of the genomic information of the target bacterial species. Therefore, QC-Blind could serve well in situations where limited information is available for both target and contamination species.

Cite

CITATION STYLE

APA

Xi, W., Gao, Y., Cheng, Z., Chen, C., Han, M., Yang, P., … Ning, K. (2019). Using QC-blind for quality control and contamination screening of bacteria DNA sequencing data without reference genome. Frontiers in Microbiology, 10(JULY). https://doi.org/10.3389/fmicb.2019.01560

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free