Background: Many methods have been developed for metagenomic sequence classification, and most of them depend heavily on genome sequences of the known organisms. A large portion of sequencing sequences may be classified as unknown, which greatly impairs our understanding of the whole sample. Result: Here we present MetaBinG2, a fast method for metagenomic sequence classification, especially for samples with a large number of unknown organisms. MetaBinG2 is based on sequence composition, and uses GPUs to accelerate its speed. A million 100 bp Illumina sequences can be classified in about 1 min on a computer with one GPU card. We evaluated MetaBinG2 by comparing it to multiple popular existing methods. We then applied MetaBinG2 to the dataset of MetaSUB Inter-City Challenge provided by CAMDA data analysis contest and compared community composition structures for environmental samples from different public places across cities. Conclusion: Compared to existing methods, MetaBinG2 is fast and accurate, especially for those samples with significant proportions of unknown organisms. Reviewers: This article was reviewed by Drs. Eran Elhaik, Nicolas Rascovan, and Serghei Mangul.
CITATION STYLE
Qiao, Y., Jia, B., Hu, Z., Sun, C., Xiang, Y., & Wei, C. (2018). MetaBinG2: A fast and accurate metagenomic sequence classification system for samples with many unknown organisms. Biology Direct, 13(1). https://doi.org/10.1186/s13062-018-0220-y
Mendeley helps you to discover research relevant for your work.