A system for converting PDF documents into structured XML format

50Citations
Citations of this article
47Readers
Mendeley users who have this article in their library.

This article is free to access.

Abstract

We present in this paper a system for converting PDF legacy documents into structured XML format. This conversion system first extracts the different streams contained in PDF files (text, bitmap and vectorial images) and then applies different components in order to express in XML the logically structured documents. Some of these components are traditional in Document Analysis, other more specific to PDF. We also present a graphical user interface in order to check, correct and validate the analysis of the components. We eventually report on two real user cases where this system was applied on. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Déjean, H., & Meunier, J. L. (2006). A system for converting PDF documents into structured XML format. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3872 LNCS, pp. 129–140). Springer Verlag. https://doi.org/10.1007/11669487_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free