Mind the data skew: Distributed inferencing by speeddating in elastic regions

63Citations
Citations of this article
44Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Semantic Web data exhibits very skewed frequency distributions among terms. Efficient large-scale distributed reasoning methods should maintain load-balance in the face of such highly skewed distribution of input data. We show that term-based partitioning, used by most distributed reasoning approaches, has limited scalability due to load-balancing problems. We address this problem with a method for data distribution based on clustering in elastic regions. Instead of as- signing data to fixed peers, data flows semi-randomly in the network. Data items "speed-date" while being temporarily collocated in the same peer. We introduce a bias in the routing to allow semantically clustered neighborhoods to emerge. Our approach is self-organising, efficient and does not require any central coordination. We have implemented this method on the MaRVIN platform and have performed experiments on large real-world datasets, using a cluster of up to 64 nodes. We compute the RDFS closure over different datasets and show that our clustering algorithm drastically reduces computation time, calculating the RDFS closure of 200 million triples in 7.2 minutes. © 2010 International World Wide Web Conference Committee (IW3C2).

Cite

CITATION STYLE

APA

Kotoulas, S., Oren, E., & Van Harmelen, F. (2010). Mind the data skew: Distributed inferencing by speeddating in elastic regions. In Proceedings of the 19th International Conference on World Wide Web, WWW ’10 (pp. 531–540). https://doi.org/10.1145/1772690.1772745

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free