An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

Jongtae Lim; undefined Kim; Hyeonbyeong Lee; Dojin Choi; Kyoungsoo Bok; Jaesoo Yoo

Journal ArticleOPEN ACCESS

An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

Lim J
Kim
Lee H
et al.

Applied Sciences (Switzerland) (2022) 12(1)

DOI: 10.3390/app12010122

4Citations

5Readers

Abstract

Various distributed processing schemes were studied to efficiently utilize a large scale of RDF graph in semantic web services. This paper proposes a new distributed SPARQL query processing scheme considering communication costs in Spark environments to reduce I/O costs during SPARQL query processing. We divide a SPARQL query into several subqueries using a WHERE clause to process a query of an RDF graph stored in a distributed environment. The proposed scheme reduces data communication costs by grouping the divided subqueries in related nodes through the index and processing them, and the grouped subqueries calculate the cost of all possible query execution paths to select an efficient query execution path. The efficient query execution path is selected through the algorithm considering the data parsing cost of all possible query execution paths, amount of data communication, and queue time per node. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.

Author supplied keywords

Cite

CITATION STYLE

APA

Lim, J., Kim, Lee, H., Choi, D., Bok, K., & Yoo, J. (2022). An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments. Applied Sciences (Switzerland), 12(1). https://doi.org/10.3390/app12010122

An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

Abstract

Author supplied keywords

Cite

Register to see more suggestions