Scalable SAPRQL querying processing on large RDF data in cloud computing environment

4Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Recently the flexibility of RDF data model makes increasing number of organizations and communities keep their data available in the RDF format. There is a growing need for querying these data in scalable and efficient way. MapReduce is a parallel data processing solution for processing large data-intensive workloads, which is not supported directly for join-intensive workloads. In this paper, we present a schema based hybrid partitioning technique for RDF triples placement according to the relationships between them, and reduce the necessary number of MR cycles in each SAPRQL query job. Then we propose a lightweight sideways information passing techniques which pass the join information across MR jobs to decrease the intermediate results involved in join operations. The experimental results show that our approaches achieve a substantial performance improvement, and outperform the previous system by a factor of 2-20 using LUBM benchmark. © 2013 Springer-Verlag.

Cite

CITATION STYLE

APA

Wu, B., Jin, H., & Yuan, P. (2013). Scalable SAPRQL querying processing on large RDF data in cloud computing environment. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7719 LNCS, pp. 631–646). https://doi.org/10.1007/978-3-642-37015-1_55

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free