Towards Internet-Scale Cardinality Estimation of XPath Queries over Distributed XML Data

  • Slavov V
  • Rao P
N/ACitations
Citations of this article
4Readers
Mendeley users who have this article in their library.

Abstract

In the last decade, we have witnessed a huge success of the peerto- peer (P2P) computing model. This has lead to the development of many Internet-scale applications and systems that are used commercially. Recently, the problem of computing statistics over data in Internet-scale systems has received attention. In this paper, we discuss the problem of cardinality estimation of XPath queries over distributed XML data stored in an Internet-scale environment such as a P2P network. Such cardinality estimates are useful for XQuery optimization and statistical hypothesis testing in domains such as health informatics. We present a novel gossip algorithm called XGossip, which given an XPath query, estimates the number of XML documents that contain a match for the query. XGossip is designed to be scalable, decentralized, and robust to failures – properties that are desirable in a large-scale distributed system. XGossip employs a novel divide-and-conquer strategy for load balancing and reducing bandwidth consumption. We conduct theoretical analyses on the quality of cardinality estimates, message complexity, and bandwidth consumption. We present a preliminary performance evaluation on PlanetLab and discuss our ongoing work.

Cite

CITATION STYLE

APA

Slavov, V. G., & Rao, P. R. (2011). Towards Internet-Scale Cardinality Estimation of XPath Queries over Distributed XML Data. NetDB Workshop in SIGMOD, 1–35.

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free