Reconstructing evolutionary trees in parallel for massive sequences

Quan Zou; Shixiang Wan; Xiangxiang Zeng; Zhanshan Sam Ma

Journal ArticleOPEN ACCESS

Reconstructing evolutionary trees in parallel for massive sequences

BMC Systems Biology (2017) 11

DOI: 10.1186/s12918-017-0476-3

15Citations

20Readers

Abstract

Background: Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. Results: HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. Conclusions: In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/.

Author supplied keywords

Cite

CITATION STYLE

APA

Zou, Q., Wan, S., Zeng, X., & Ma, Z. S. (2017). Reconstructing evolutionary trees in parallel for massive sequences. BMC Systems Biology, 11. https://doi.org/10.1186/s12918-017-0476-3

Reconstructing evolutionary trees in parallel for massive sequences

Abstract

Author supplied keywords

Cite

Register to see more suggestions