Towards Internet-Scale Cardinality Estimation of XPath Queries over Distributed XML Data

Vasil G Slavov; Praveen R Rao

Journal Article

Towards Internet-Scale Cardinality Estimation of XPath Queries over Distributed XML Data

Slavov V
Rao P

NetDB workshop in SIGMOD (2011) 1-35

N/ACitations

4Readers

Abstract

In the last decade, we have witnessed a huge success of the peerto- peer (P2P) computing model. This has lead to the development of many Internet-scale applications and systems that are used commercially. Recently, the problem of computing statistics over data in Internet-scale systems has received attention. In this paper, we discuss the problem of cardinality estimation of XPath queries over distributed XML data stored in an Internet-scale environment such as a P2P network. Such cardinality estimates are useful for XQuery optimization and statistical hypothesis testing in domains such as health informatics. We present a novel gossip algorithm called XGossip, which given an XPath query, estimates the number of XML documents that contain a match for the query. XGossip is designed to be scalable, decentralized, and robust to failures – properties that are desirable in a large-scale distributed system. XGossip employs a novel divide-and-conquer strategy for load balancing and reducing bandwidth consumption. We conduct theoretical analyses on the quality of cardinality estimates, message complexity, and bandwidth consumption. We present a preliminary performance evaluation on PlanetLab and discuss our ongoing work.

Author supplied keywords

Cite

CITATION STYLE

APA

Slavov, V. G., & Rao, P. R. (2011). Towards Internet-Scale Cardinality Estimation of XPath Queries over Distributed XML Data. NetDB Workshop in SIGMOD, 1–35.

Towards Internet-Scale Cardinality Estimation of XPath Queries over Distributed XML Data

Abstract

Author supplied keywords

Cite

Register to see more suggestions