HISAT2 Parallelization Method Based on Spark Cluster

10Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

Sequence alignment is one of the most important components in the Bioinformatics research field. It is of great significance to discover the functional structure and genetic information of nucleic acids and protein. With the rapid development and gradual maturity of high-throughput sequencing technology, the scale of gene data which have been discovered by that is going to increasingly large. Due to the gene sequence alignment calculation has high complexity and the sequencing gene data has large scale, the process of comparison computing will cause a plenty waste of computing time. HISAT2 is the one of most popular sequence comparison software. HISAT2 has better sensitivity and accuracy than other software, at the same time, the speed of process also has highly improved. According to those reasons, this passage implements the HISAT2 parallelization method based on Apache Spark cluster. Through the comparison experiment between single and cluster machine, the parallelization computing speed of HISAT2 parallelization method based on Spark cluster has increased obviously to 3.69 times, with the high rate of accuracy meanwhile.

Cite

CITATION STYLE

APA

Guo, J., Gao, J., & Liu, Z. (2022). HISAT2 Parallelization Method Based on Spark Cluster. In Journal of Physics: Conference Series (Vol. 2179). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/2179/1/012038

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free