Sequence alignment is one of the most important components in the Bioinformatics research field. It is of great significance to discover the functional structure and genetic information of nucleic acids and protein. With the rapid development and gradual maturity of high-throughput sequencing technology, the scale of gene data which have been discovered by that is going to increasingly large. Due to the gene sequence alignment calculation has high complexity and the sequencing gene data has large scale, the process of comparison computing will cause a plenty waste of computing time. HISAT2 is the one of most popular sequence comparison software. HISAT2 has better sensitivity and accuracy than other software, at the same time, the speed of process also has highly improved. According to those reasons, this passage implements the HISAT2 parallelization method based on Apache Spark cluster. Through the comparison experiment between single and cluster machine, the parallelization computing speed of HISAT2 parallelization method based on Spark cluster has increased obviously to 3.69 times, with the high rate of accuracy meanwhile.
CITATION STYLE
Guo, J., Gao, J., & Liu, Z. (2022). HISAT2 Parallelization Method Based on Spark Cluster. In Journal of Physics: Conference Series (Vol. 2179). IOP Publishing Ltd. https://doi.org/10.1088/1742-6596/2179/1/012038
Mendeley helps you to discover research relevant for your work.