Sign up & Download
Sign in

RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web

by Radhika Sridhar, Padmashree Ravindra, Kemafor Anyanwu
The Semantic WebISWC 2009 (2009)

Abstract

As the amount of available RDF data continues to increase steadily, there is growing interest in developing efficient methods for analyzing such data. While recent efforts have focused on developing efficient methods for traditional data processing, analytical processing which typically involves more complex queries has received much less attention. The use of cost effective parallelization techniques such as Googles Map-Reduce offer significant promise for achieving Web scale analytics. However, currently available implementations are designed for simple data processing on structured data. In this paper, we present a language, RAPID, for scalable ad-hoc analytical processing of RDF data on Map-Reduce frameworks. It builds on Yahoos Pig Latin by introducing primitives based on a specialized join operator, the MD-join, for expressing analytical tasks in a manner that is more amenable to parallel processing, as well as primitives for coping with semi-structured nature of RDF data. Experimental evaluation results demonstrate significant performance improvements for analytical processing of RDF data over existing Map-Reduce based techniques.

Cite this document (BETA)

Available from www.springerlink.com
Page 1
hidden

RAPID: Enabling Scalable Ad-Hoc Analytics on the Semantic Web

A. Bernstein et al. (Eds.): ISWC 2009, LNCS 5823, pp. 715–730, 2009.
© Springer-Verlag Berlin Heidelberg 2009
RAPID: Enabling Scalable Ad-Hoc Analytics on the
Semantic Web
Radhika Sridhar, Padmashree Ravindra, and Kemafor Anyanwu
North Carolina State University
{rsridha,pravind2,kogan}@ncsu.edu
Abstract. As the amount of available RDF data continues to increase steadily,
there is growing interest in developing efficient methods for analyzing such
data. While recent efforts have focused on developing efficient methods for
traditional data processing, analytical processing which typically involves more
complex queries has received much less attention. The use of cost effective
parallelization techniques such as Google’s Map-Reduce offer significant
promise for achieving Web scale analytics. However, currently available
implementations are designed for simple data processing on structured data.
In this paper, we present a language, RAPID, for scalable ad-hoc analytical
processing of RDF data on Map-Reduce frameworks. It builds on Yahoo’s Pig
Latin by introducing primitives based on a specialized join operator, the
MD-join, for expressing analytical tasks in a manner that is more amenable to
parallel processing, as well as primitives for coping with semi-structured nature
of RDF data. Experimental evaluation results demonstrate significant
performance improvements for analytical processing of RDF data over existing
Map-Reduce based techniques.
Keywords: RDF, Scalable Analytical Processing, Map-Reduce, Pig Latin.
1 Introduction
The broadening adoption of Semantic Web tenets is giving rise to a growing amount of
data represented using the foundational metadata representation language, Resource
Description Framework (RDF) [19] In order to provide adequate support for
knowledge discovery tasks such as exists in scientific research communities, many of
which have adopted Semantic Web technologies, it is important to consider how more
complex data analysis can be enabled efficiently at Semantic Web scale. Analytical
processing involves more complex queries than traditional data processing often
requiring multiple aggregations over multiple groupings of data. These queries are
often difficult to express and optimize using traditional join, grouping and aggregation
operators and algorithms. The following two examples based on the simple Sales data
in Figure 1 illustrate the challenges with analytical queries. Assume we would like to
find “for each customer, their total sales amounts for Jan, Jun and Nov for purchases
made in the state NC”, i.e., to compute the relation (cust, jansales, junsales, novsales).
Page 2
hidden
716 R. Sridhar, P. Ravindra, and K. Anyanwu
Using traditional query operators this query will be expressed as a union query,
resulting in three sub queries (each computing the aggregates for each of the months
specified) and then an outer join for merging the related tuples for each customer. Each
of these sub queries will need a separate scan of the same typically large table. A
slightly more demanding example would be to find “for each product and month of
2000, the number of sales that were between the previous and following months’
average sales”. Computing the answer to this query requires that for each product and
month, we compute aggregates from tuples outside the group (the next and previous
month’s average sales). After these values are computed, we have enough information
to compute the output aggregate (count). This query also requires multiple pass
aggregation with a lot of repeated processing of the same set of tuples such as repeated
scans on relations just to compute slightly different values e.g. previous month
aggregate vs. next month aggregates. High-end database systems and OLAP servers
such as Teradata, Tandem, NCR, Oracle-n CUBE, and Microsoft and SAS OLAP
servers with specialized parallel architectures and sophisticated indexing schemes
employ techniques to mitigate this inefficiency. However, such systems are very
expensive and are targeted at enterprise scale processing making it difficult to scale
them to the Web in a straightforward and cost effective way.

Fig. 1. RDF representation of Sales relation

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

12 Readers on Mendeley
by Discipline
 
by Academic Status
 
58% Ph.D. Student
 
17% Associate Professor
 
8% Student (Master)
by Country
 
33% France
 
17% China
 
8% United Kingdom