Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig

Spyros Kotoulas; Jacopo Urbani; Peter Boncz; Peter Mika

Conference ProceedingsOPEN ACCESS

Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7649 LNCS(PART 1) 247-262

DOI: 10.1007/978-3-642-35176-1_16

20Citations

29Readers

Abstract

We describe a system that incrementally translates SPARQL queries to Pig Latin and executes them on a Hadoop cluster. This system is designed to work efficiently on complex queries with many self-joins over huge datasets, avoiding job failures even in the case of joins with unexpected high-value skew. To be robust against cost estimation errors, our system interleaves query optimization with query execution, determining the next steps to take based on data samples and statistics gathered during the previous step. Furthermore, we have developed a novel skew-resistant join algorithm that replicates tuples corresponding to popular keys. We evaluate the effectiveness of our approach both on a synthetic benchmark known to generate complex queries (BSBM-BI) as well as on a Yahoo! case of data analysis using RDF data crawled from the web. Our results indicate that our system is indeed capable of processing huge datasets without pre-computed statistics while exhibiting good load-balancing properties. © 2012 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Kotoulas, S., Urbani, J., Boncz, P., & Mika, P. (2012). Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7649 LNCS, pp. 247–262). Springer Verlag. https://doi.org/10.1007/978-3-642-35176-1_16

Robust runtime optimization and skew-resistant execution of analytical SPARQL queries on pig

Abstract

Cite

Register to see more suggestions