Efficient SPARQL query evaluation via automatic data partitioning

Tao Yang; Jinchuan Chen; Xiaoyan Wang; Yueguo Chen; Xiaoyong Du

Conference Proceedings

Efficient SPARQL query evaluation via automatic data partitioning

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2013) 7826 LNCS(PART 2) 244-258

DOI: 10.1007/978-3-642-37450-0_18

9Citations

17Readers

Get full text

Abstract

The volume of RDF data increases very fast within the last five years, e.g. the Linked Open Data cloud grows from 2 billions to 50 billions of RDF triples. With its wonderful scalability, cloud computing platform like Hadoop is a good choice for processing queries over large data sets. Previous works on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect the performance. Specifically, a good partitioning will greatly reduce or even totally avoid cross-node joins and significantly reduce the cost of query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture where Map/Reduce takes charge of the computing tasks and an RDF query engine, RDF-3X, stores the data and evaluates join operations over local data. Based on analysis of query work-loads, we propose a novel algorithm for automatically partitioning RDF data. We also present an approximate solution to physically place the partitions in order to reduce data redundancy. All the proposed approaches are evaluated by extensive experiments over large RDF data sets. © Springer-Verlag 2013.

Cite

CITATION STYLE

APA

Yang, T., Chen, J., Wang, X., Chen, Y., & Du, X. (2013). Efficient SPARQL query evaluation via automatic data partitioning. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7826 LNCS, pp. 244–258). https://doi.org/10.1007/978-3-642-37450-0_18

Efficient SPARQL query evaluation via automatic data partitioning

Abstract

Cite

Register to see more suggestions