Approximate matching between large tree sets is broadly used in many applications such as data integration and XML de-duplication. However, most existing methods suffer for low efficiency, thus do not scale to large tree sets. pq-gram is a widely-used method with high quality of matches. In this paper, we propose pq-hash as an improvement to pq-gram. As the base of pq-hash, a randomized data structure, pq-array, is developed. With pq-array, large trees are represented as small fixed sized arrays. Sort-merge and hash join technique is applied based on these pq-arrays to avoid nested-loop join. From theoretical analysis and experimental results, retaining high join quality, pq-hash gains much higher efficiency than pq-gram. © 2010 Springer-Verlag.
CITATION STYLE
Li, F., Wang, H., Hao, L., Li, J., & Gao, H. (2010). pq-Hash: An Efficient method for approximate XML joins. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6185 LNCS, pp. 125–134). https://doi.org/10.1007/978-3-642-16720-1_13
Mendeley helps you to discover research relevant for your work.