Scalable SAPRQL querying processing on large RDF data in cloud computing environment

Buwen Wu; Hai Jin; Pingpeng Yuan

Conference Proceedings

Scalable SAPRQL querying processing on large RDF data in cloud computing environment

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7719 LNCS 631-646

DOI: 10.1007/978-3-642-37015-1_55

4Citations

7Readers

Get full text

Abstract

Recently the flexibility of RDF data model makes increasing number of organizations and communities keep their data available in the RDF format. There is a growing need for querying these data in scalable and efficient way. MapReduce is a parallel data processing solution for processing large data-intensive workloads, which is not supported directly for join-intensive workloads. In this paper, we present a schema based hybrid partitioning technique for RDF triples placement according to the relationships between them, and reduce the necessary number of MR cycles in each SAPRQL query job. Then we propose a lightweight sideways information passing techniques which pass the join information across MR jobs to decrease the intermediate results involved in join operations. The experimental results show that our approaches achieve a substantial performance improvement, and outperform the previous system by a factor of 2-20 using LUBM benchmark. © 2013 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Wu, B., Jin, H., & Yuan, P. (2013). Scalable SAPRQL querying processing on large RDF data in cloud computing environment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7719 LNCS, pp. 631–646). https://doi.org/10.1007/978-3-642-37015-1_55

Scalable SAPRQL querying processing on large RDF data in cloud computing environment

Abstract

Author supplied keywords

Cite

Register to see more suggestions