SparkBLAST: Scalable BLAST processing using in-memory operations

Marcelo Rodrigo de Castro; Catherine dos Santos Tostes; Alberto M.R. Dávila; Hermes Senger; Fabricio A.B. da Silva

Journal ArticleOPEN ACCESS

SparkBLAST: Scalable BLAST processing using in-memory operations

BMC Bioinformatics (2017) 18(1)

DOI: 10.1186/s12859-017-1723-8

23Citations

51Readers

Abstract

Background: The demand for processing ever increasing amounts of genomic data has raised new challenges for the implementation of highly scalable and efficient computational systems. In this paper we propose SparkBLAST, a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computational resources and Apache Spark as the coordination framework. As a proof of concept, some radionuclide-resistant bacterial genomes were selected for similarity analysis. Results: Experiments in Google and Microsoft Azure clouds demonstrated that SparkBLAST outperforms an equivalent system implemented on Hadoop in terms of speedup and execution times. Conclusions: The superior performance of SparkBLAST is mainly due to the in-memory operations available through the Spark framework, consequently reducing the number of local I/O operations required for distributed BLAST processing.

Author supplied keywords

Cite

CITATION STYLE

APA

de Castro, M. R., Tostes, C. dos S., Dávila, A. M. R., Senger, H., & da Silva, F. A. B. (2017). SparkBLAST: Scalable BLAST processing using in-memory operations. BMC Bioinformatics, 18(1). https://doi.org/10.1186/s12859-017-1723-8

SparkBLAST: Scalable BLAST processing using in-memory operations

Abstract

Author supplied keywords

Cite

Register to see more suggestions