Efficient similarity search for tree-structured data

8Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Tree-structured data are becoming ubiquitous nowadays and manipulating them based on similarity is essential for many applications. Although similarity search on textual data has been extensively studied, searching for similar trees is still an open problem due to the high complexity of computing the similarity between trees, especially for large numbers of tress. In this paper, we propose to transform tree-structured data into strings with a one-to-one mapping. We prove that the edit distance of the corresponding strings forms a bound for the similarity measures between trees, including tree edit distance, largest common subtrees and smallest common super-trees. Based on the theoretical analysis, we can employ any existing algorithm of approximate string search for effective similarity search on trees. Moreover, we embed the bound into a filter-and-refine framework for facilitating similarity search on tree-structured data. The experimental results show that our algorithm achieves high performance and outperforms state-of-the-art methods significantly. Our method is especially suitable for accelerating similarity query processing on large numbers of trees in massive datasets. © 2008 Springer-Verlag.

Cite

CITATION STYLE

APA

Li, G., Liu, X., Feng, J., & Zhou, L. (2008). Efficient similarity search for tree-structured data. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5069 LNCS, pp. 131–149). https://doi.org/10.1007/978-3-540-69497-7_11

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free