In this contribution, we present our approach to querying XML document that is stored in a distributed system. The main goal of this paper is to describe how to use Spark SQL framework to implement a subset of expressions from XPath query language. Five different methods of our approach are introduced and compared, and by this, we also demonstrate the actual state of query optimization on Spark SQL platform. It may be taken as the next contribution of our paper. A subset of expressions from XPath query language (supported by the implemented methods) contains all XPath axes except the axes of attribute and namespace while predicates are not implemented in our prototype. We present our implemented system, data, measurements, tests, and results. The evaluated results support our belief that our method significantly decreases data transfers in the distributed system that occur during the query evaluation.
CITATION STYLE
Hricov, R., Šenk, A., Kroha, P., & Valenta, M. (2017). Evaluation of XPath queries over XML documents using SparkSQL framework. In Communications in Computer and Information Science (Vol. 716, pp. 28–41). Springer Verlag. https://doi.org/10.1007/978-3-319-58274-0_3
Mendeley helps you to discover research relevant for your work.