© 2014 Long et al.; licensee BioMed Central Ltd.Background: UniFrac is a well-known tool for comparing microbial communities and assessing statistically significant differences between communities. In this paper we identify a discrepancy in the UniFrac methodology that causes semantically equivalent inputs to produce different outputs in tests of statistical significance. Results: The phylogenetic trees that are input into UniFrac may or may not contain abundance counts. An isomorphic transform can be defined that will convert trees between these two formats without altering the semantic meaning of the trees. UniFrac produces different outputs for these equivalent forms of the same input tree. This is illustrated using metagenomics data from a lake sediment study. Conclusions: Results from the UniFrac tool can vary greatly for the same input depending on the arbitrary choice of input format. Practitioners should be aware of this issue and use the tool with caution to ensure consistency and validity in their analyses. We provide a script to transform inputs between equivalent formats to help researchers achieve this consistency.
Long, J. R., Pittet, V., Trost, B., Yan, Q., Vickers, D., Haakensen, M., & Kusalik, A. (2014, August 13). Equivalent input produces different output in the UniFrac significance test. BMC Bioinformatics. BioMed Central Ltd. https://doi.org/10.1186/1471-2105-15-278