MrsRF: An efficient MapReduce algorithm for analyzing large collections of evolutionary trees

33Citations
Citations of this article
80Readers
Mendeley users who have this article in their library.

Abstract

Background: MapReduce is a parallel framework that has been used effectively to design large-scale parallel applications for large computing clusters. In this paper, we evaluate the viability of the MapReduce framework for designing phylogenetic applications. The problem of interest is generating the all-to-all Robinson-Foulds distance matrix, which has many applications for visualizing and clustering large collections of evolutionary trees. We introduce MrsRF (MapReduce Speeds up RF), a multi-core algorithm to generate a t × t Robinson-Foulds distance matrix between t trees using the MapReduce paradigm.Results: We studied the performance of our MrsRF algorithm on two large biological trees sets consisting of 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each. Our experiments show that MrsRF is a scalable approach reaching a speedup of over 18 on 32 total cores. Our results also show that achieving top speedup on a multi-core cluster requires different cluster configurations. Finally, we show how to use an RF matrix to summarize collections of phylogenetic trees visually.Conclusion: Our results show that MapReduce is a promising paradigm for developing multi-core phylogenetic applications. The results also demonstrate that different multi-core configurations must be tested in order to obtain optimum performance. We conclude that RF matrices play a critical role in developing techniques to summarize large collections of trees. © 2010 Matthews and Williams; licensee BioMed Central Ltd.

References Powered by Scopus

MapReduce: Simplified data processing on large clusters

11918Citations
N/AReaders
Get full text

Comparison of phylogenetic trees

1892Citations
N/AReaders
Get full text

Open MPI: Goals, concept, and design of a next generation MPI implementation

985Citations
N/AReaders
Get full text

Cited by Powered by Scopus

'Big data', Hadoop and cloud computing in genomics

373Citations
N/AReaders
Get full text

Cloud computing and the DNA data race

219Citations
N/AReaders
Get full text

MapCG: Writing parallel program portable between CPU and GPU

111Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Matthews, S. J., & Williams, T. L. (2010). MrsRF: An efficient MapReduce algorithm for analyzing large collections of evolutionary trees. BMC Bioinformatics, 11(SUPPLL.1). https://doi.org/10.1186/1471-2105-11-S1-S15

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 37

55%

Researcher 21

31%

Professor / Associate Prof. 6

9%

Lecturer / Post doc 3

4%

Readers' Discipline

Tooltip

Computer Science 34

49%

Agricultural and Biological Sciences 30

43%

Biochemistry, Genetics and Molecular Bi... 5

7%

Chemistry 1

1%

Save time finding and organizing research with Mendeley

Sign up for free