Classifying XML documents based on structure/content similarity

Guangming Xing; Jinhua Guo; Zhonghang Xia

Conference Proceedings

Classifying XML documents based on structure/content similarity

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2007) 4518 LNCS 444-457

DOI: 10.1007/978-3-540-73888-6_42

12Citations

4Readers

Get full text

Abstract

In this paper, we present a framework for classifying XML documents based on structure/content similarity between XML documents. Firstly, an algorithm is proposed for computing the edit distance between an ordered labeled tree and a regular hedge grammar. The new edit distance gives a more precise measure for structural similarity than existing distance metrics in the literature. Secondly, we study schema extraction from XML documents, and an effective solution based on minimum length description (MLD) principle is given. Our schema extraction method allows trade off between schema simplicity and precision based on the user's specification. Thirdly, classification of XML documents is discussed. Representation of XML documents based on the structures and contents is also studied. The efficacy and efficiency of our methodology have been tested using the data sets from XML Mining Challenge. © Springer-Verlag Berlin Heidelberg 2007.

Cite

CITATION STYLE

APA

Xing, G., Guo, J., & Xia, Z. (2007). Classifying XML documents based on structure/content similarity. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 4518 LNCS, pp. 444–457). Springer Verlag. https://doi.org/10.1007/978-3-540-73888-6_42

Classifying XML documents based on structure/content similarity

Abstract

Cite

Register to see more suggestions