Abstract
We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files (text, bitmap and vectorial images) and then applies different components in order to express in XML the logically structured documents. Some of these components are traditional in Document Analysis, other more specific to PDF. We also present a graphical user interface in order to check, correct and validate the analysis of the components. We eventually report on two real user cases where this system was applied on. © Springer-Verlag Berlin Heidelberg 2006.
Cite
CITATION STYLE
Déjean, H., & Meunier, J. L. (2006). A system for converting PDF documents into structured XML format. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3872 LNCS, pp. 129–140). Springer Verlag. https://doi.org/10.1007/11669487_12
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.