Corpus-based structure mapping of XML document corpora: A reinforcement learning based model

Francis Maes; Ludovic Denoyer; Patrick Gallinari

Journal Article

Corpus-based structure mapping of XML document corpora: A reinforcement learning based model

Studies in Computational Intelligence (2011) 370 249-266

DOI: 10.1007/978-3-642-22613-7_13

0Citations

6Readers

Get full text

Abstract

We address the problem of learning to map automatically flat and semi-structured documents onto a mediated target XML schema. This problem is motivated by the recent development of applications for searching and mining semi-structured document sources and corpora. Academic research has mainly dealt with homogeneous collections. In practical applications, data come from multiple heterogeneous sources and mining such collections requires defining a mapping or correspondence between the different document formats. Automating the design of such mappings has rapidly become a key issue for these applications. We propose a machine learning approach to this problem where the mapping is learned from pairs of input and corresponding target documents provided by a user. The mapping process is formalized as a Markov Decision Process, and training is performed through a classical machine learning framework known as Reinforcement Learning. The resulting model is able to cope with complex mappings while keeping a linear complexity. We describe a set of experiments on several corpora representative of different mapping tasks and show that the method is able to learn mappings with a high accuracy on different corpora. © 2011 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Maes, F., Denoyer, L., & Gallinari, P. (2011). Corpus-based structure mapping of XML document corpora: A reinforcement learning based model. Studies in Computational Intelligence, 370, 249–266. https://doi.org/10.1007/978-3-642-22613-7_13

Corpus-based structure mapping of XML document corpora: A reinforcement learning based model

Abstract

Cite

Register to see more suggestions