Evaluate Structure Similarity in XML Documents with Merge-Edit-Distance

1Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

XML language is widely used as a standard for data representation and exchange among Web applications. In recent years, many efforts have been spent in querying, integrating and clustering XML documents. Measuring the similarity among XML documents is the foundation of such applications. In this paper, we propose a new similarity measure method among the XML documents, which is based on Merge-Edit-Distance (MED). MED upholds the distribution information of the common tree in XML document trees. We urge the distribution information is useful for determining the similarity of XML documents. A novel algorithm is also proposed to calculate MED as follows. Given two XML document trees A and B, it compresses the two trees into one merge tree C and then transforms the tree C to the common tree of A and B with the defined operations such as "Delete", "Reduce", "Combine". The cost of the operation sequence is defined as MED. The experiments on real datasets give the evidence that the proposed similarity measure is effective. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Zhou, C., Lu, Y., Zou, L., & Hu, R. (2007). Evaluate Structure Similarity in XML Documents with Merge-Edit-Distance. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4819 LNAI, pp. 301–311). Springer Verlag. https://doi.org/10.1007/978-3-540-77018-3_31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free