Abstract
In the last decade, we have witnessed a huge success of the peerto- peer (P2P) computing model. This has lead to the development of many Internet-scale applications and systems that are used commercially. Recently, the problem of computing statistics over data in Internet-scale systems has received attention. In this paper, we discuss the problem of cardinality estimation of XPath queries over distributed XML data stored in an Internet-scale environment such as a P2P network. Such cardinality estimates are useful for XQuery optimization and statistical hypothesis testing in domains such as health informatics. We present a novel gossip algorithm called XGossip, which given an XPath query, estimates the number of XML documents that contain a match for the query. XGossip is designed to be scalable, decentralized, and robust to failures – properties that are desirable in a large-scale distributed system. XGossip employs a novel divide-and-conquer strategy for load balancing and reducing bandwidth consumption. We conduct theoretical analyses on the quality of cardinality estimates, message complexity, and bandwidth consumption. We present a preliminary performance evaluation on PlanetLab and discuss our ongoing work.
Author supplied keywords
Cite
CITATION STYLE
Slavov, V. G., & Rao, P. R. (2011). Towards Internet-Scale Cardinality Estimation of XPath Queries over Distributed XML Data. NetDB Workshop in SIGMOD, 1–35.
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.