The goal of metagenomic binning is to reconstruct genomes from a mixture of DNA sequences into genomic bins, which can be considered a clustering task. Multiple methods have been proposed for this task, such as distance-based metrics, machine learning, and ensemble approaches. We propose BinChill, a metagenomic ensemble method, based on the generic co-occurrence ensembler method, ACE. BinChill incorporates domain information in the form of Single-Copy Genes (SCG) with a co-occurrence strategy. This strategy combines multiple clustering partitions according to how often two items co-occur in the same cluster. BinChill was able to reconstruct more or equally as many high- and medium quality while having an equal or faster runtime than other metagenomics-specific methods on a smaller simulated dataset. On larger datasets, both simulated and real-world, BinChill outperformed other methods in reconstructing high-quality bins, at the cost of an increased processing time when compared to generic ensemble clustering algorithms. This is due to the domain-specific steps that our method implements. Our results show that the strengths of multiple partitions can be combined to generate a partition of higher quality.
CITATION STYLE
Bak, O. S., Jensen, M. D., Trudslev, F. M., Windfeld, A., & Lamurias, A. (2023). BinChill: A Metagenomic Binning Ensemble Method. IEEE Access, 11, 49561–49577. https://doi.org/10.1109/ACCESS.2023.3277755
Mendeley helps you to discover research relevant for your work.