A tree-based approach to clustering XML documents by structure

66Citations
Citations of this article
21Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We propose a novel methodology for clustering XML documents on the basis of their structural similarities. The idea is to equip each cluster with an XML cluster representative, i.e. an XML document subsuming the most typical structural specifics of a set of XML documents. Clustering is essentially accomplished by comparing cluster representatives, and updating the representatives as soon as new clusters are detected. We present an algorithm for the computation of an XML representative based on suitable techniques for identifying significant node matchings and for reliably merging and pruning XML trees. Experimental evaluation performed on both synthetic and real data shows the effectiveness of our approach. © Springer-Verlag Berlin Heidelberg 2004.

Cite

CITATION STYLE

APA

Costa, G., Manco, G., Ortale, R., & Tagarelli, A. (2004). A tree-based approach to clustering XML documents by structure. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3202, 137–148. https://doi.org/10.1007/978-3-540-30116-5_15

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free