An efficient and extensible approach for compressing phylogenetic trees

4Citations
Citations of this article
17Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Background: Biologists require new algorithms to efficiently compress and store their large collections of phylogenetic trees. Our previous work showed that TreeZip is a promising approach for compressing phylogenetic trees. In this paper, we extend our TreeZip algorithm by handling trees with weighted branches. Furthermore, by using the compressed TreeZip file as input, we have designed an extensible decompressor that can extract subcollections of trees, compute majority and strict consensus trees, and merge tree collections using set operations such as union, intersection, and set difference.Results: On unweighted phylogenetic trees, TreeZip is able to compress Newick files in excess of 98%. On weighted phylogenetic trees, TreeZip is able to compress a Newick file by at least 73%. TreeZip can be combined with 7zip with little overhead, allowing space savings in excess of 99% (unweighted) and 92%(weighted). Unlike TreeZip, 7zip is not immune to branch rotations, and performs worse as the level of variability in the Newick string representation increases. Finally, since the TreeZip compressed text (TRZ) file contains all the semantic information in a collection of trees, we can easily filter and decompress a subset of trees of interest (such as the set of unique trees), or build the resulting consensus tree in a matter of seconds. We also show the ease of which set operations can be performed on TRZ files, at speeds quicker than those performed on Newick or 7zip compressed Newick files, and without loss of space savings.Conclusions: TreeZip is an efficient approach for compressing large collections of phylogenetic trees. The semantic and compact nature of the TRZ file allow it to be operated upon directly and quickly, without a need to decompress the original Newick file. We believe that TreeZip will be vital for compressing and archiving trees in the biological community. © 2011 Matthews and Williams; licensee BioMed Central Ltd.

References Powered by Scopus

MRBAYES: Bayesian inference of phylogenetic trees

20659Citations
N/AReaders
Get full text

TNT, a free program for phylogenetic analysis

4689Citations
N/AReaders
Get full text

Compressing integers for fast file access

176Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A new constant-time parallel algorithm for merging

6Citations
N/AReaders
Get full text

Heterogeneous Compression of Large Collections of Evolutionary Trees

1Citations
N/AReaders
Get full text

Accurate simulation of large collections of phylogenetic trees

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Matthews, S. J., & Williams, T. L. (2011). An efficient and extensible approach for compressing phylogenetic trees. BMC Bioinformatics, 12(SUPPL. 10). https://doi.org/10.1186/1471-2105-12-S10-S16

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 6

40%

Researcher 4

27%

Professor / Associate Prof. 3

20%

Lecturer / Post doc 2

13%

Readers' Discipline

Tooltip

Agricultural and Biological Sciences 6

50%

Computer Science 3

25%

Mathematics 2

17%

Chemistry 1

8%

Save time finding and organizing research with Mendeley

Sign up for free