An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments

4Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.

Abstract

Various distributed processing schemes were studied to efficiently utilize a large scale of RDF graph in semantic web services. This paper proposes a new distributed SPARQL query processing scheme considering communication costs in Spark environments to reduce I/O costs during SPARQL query processing. We divide a SPARQL query into several subqueries using a WHERE clause to process a query of an RDF graph stored in a distributed environment. The proposed scheme reduces data communication costs by grouping the divided subqueries in related nodes through the index and processing them, and the grouped subqueries calculate the cost of all possible query execution paths to select an efficient query execution path. The efficient query execution path is selected through the algorithm considering the data parsing cost of all possible query execution paths, amount of data communication, and queue time per node. It is shown through various performance evaluations that the proposed scheme outperforms the existing schemes.

Cite

CITATION STYLE

APA

Lim, J., Kim, Lee, H., Choi, D., Bok, K., & Yoo, J. (2022). An Efficient Distributed SPARQL Query Processing Scheme Considering Communication Costs in Spark Environments. Applied Sciences (Switzerland), 12(1). https://doi.org/10.3390/app12010122

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free