Metagenomic binning refers to the means of clustering or assigning taxonomy to metagenomic sequences or contigs. Due to the massive abundance of organisms in metagenomic samples, the number of nucleotide sequences skyrockets, and thus leading to the complexity of binning algorithms. Unsupervised classification is gaining a reputation in recent years since the lacking of the reference database required in the reference-based methods with various state-of-the-art tools released. By manipulating the overlapping information between reads drives to the success of various unsupervised methods with extraordinary accuracy. These research practices on the evidence that the average proportion of common l-mers between genomes of different species is practically miniature when l is sufficient. This paper introduces a novel algorithm for binning metagenomic sequences without requiring reference databases by utilizing highly connected components inside a weighted overlapping graph of reads. Experimental outcomes show that the precision is improved over other well-known binning tools for both short and long sequences.
CITATION STYLE
Pham, H. T., Vinh, L. V., Lang, T. V., & Tran, V. H. (2019). GMeta: A Novel Algorithm to Utilize Highly Connected Components for Metagenomic Binning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11814 LNCS, pp. 545–559). Springer. https://doi.org/10.1007/978-3-030-35653-8_35
Mendeley helps you to discover research relevant for your work.