pq-Hash: An Efficient method for approximate XML joins

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Approximate matching between large tree sets is broadly used in many applications such as data integration and XML de-duplication. However, most existing methods suffer for low efficiency, thus do not scale to large tree sets. pq-gram is a widely-used method with high quality of matches. In this paper, we propose pq-hash as an improvement to pq-gram. As the base of pq-hash, a randomized data structure, pq-array, is developed. With pq-array, large trees are represented as small fixed sized arrays. Sort-merge and hash join technique is applied based on these pq-arrays to avoid nested-loop join. From theoretical analysis and experimental results, retaining high join quality, pq-hash gains much higher efficiency than pq-gram. © 2010 Springer-Verlag.

Cite

CITATION STYLE

APA

Li, F., Wang, H., Hao, L., Li, J., & Gao, H. (2010). pq-Hash: An Efficient method for approximate XML joins. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 6185 LNCS, pp. 125–134). https://doi.org/10.1007/978-3-642-16720-1_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free