We present an integrated framework for the document conversion from legacy formats to XML format. We describe the LegDoC project, aimed at automating the conversion of layout annotations layout-oriented formats like PDF, PS and HTML to semantic-oriented annotations. A toolkit of different components covers complementary techniques the logical document analysis and semantic annotations with the methods of machine learning. We use a real case conversion project as a driving example to exemplify different techniques implemented in the project. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Chanod, J. P., Chidlovskii, B., Dejean, H., Fambon, O., Fuselier, J., Jacquin, T., & Meunier, J. L. (2005). From legacy documents to XML: A conversion framework. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3652 LNCS, pp. 92–103). https://doi.org/10.1007/11551362_9
Mendeley helps you to discover research relevant for your work.