Corpus-based structure mapping of XML document corpora: A reinforcement learning based model

0Citations
Citations of this article
6Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We address the problem of learning to map automatically flat and semi-structured documents onto a mediated target XML schema. This problem is motivated by the recent development of applications for searching and mining semi-structured document sources and corpora. Academic research has mainly dealt with homogeneous collections. In practical applications, data come from multiple heterogeneous sources and mining such collections requires defining a mapping or correspondence between the different document formats. Automating the design of such mappings has rapidly become a key issue for these applications. We propose a machine learning approach to this problem where the mapping is learned from pairs of input and corresponding target documents provided by a user. The mapping process is formalized as a Markov Decision Process, and training is performed through a classical machine learning framework known as Reinforcement Learning. The resulting model is able to cope with complex mappings while keeping a linear complexity. We describe a set of experiments on several corpora representative of different mapping tasks and show that the method is able to learn mappings with a high accuracy on different corpora. © 2011 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Maes, F., Denoyer, L., & Gallinari, P. (2011). Corpus-based structure mapping of XML document corpora: A reinforcement learning based model. Studies in Computational Intelligence, 370, 249–266. https://doi.org/10.1007/978-3-642-22613-7_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free