Motivation: Phylogenetic placement (i.e., the insertion of a sequence into a phylogenetic tree) is a basic step in several bioinformatics pipelines, including taxon identification in metagenomic analysis and large scale phylogeny estimation. The most accurate current method is pplacer, which attempts to optimize the placement using maximum likelihood, but it frequently fails on datasets where the phylogenetic tree has 5000 leaves. APPLES is the current most scalable method, and EPA-ng, although more scalable than pplacer and more accurate than APPLES, also fails on many 50,000-taxon trees. Here we describe pplacerDC, a divide-and-conquer approach that enables pplacer to be used when the phylogenetic tree is very large. Results: Our study shows that pplacerDC has excellent accuracy and scalability, matching pplacer where pplacer can run, improving accuracy compared to APPLES and EPA-ng, and is able to run on datasets with up to 100,000 sequences. Availability: The pplacerDC code is available on GitHub at https://github.com/kodingkoning/pplacerDC.
CITATION STYLE
Koning, E., Phillips, M., & Warnow, T. (2021). PpIacerDC: A new scalable phylogenetic placement method. In Proceedings of the 12th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics, BCB 2021. Association for Computing Machinery, Inc. https://doi.org/10.1145/3459930.3469516
Mendeley helps you to discover research relevant for your work.