A study on information extraction from PDF files

13Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Portable Document Format (PDF) is increasingly being recognized as a common format of electronic documents. The prerequisite to management and indexing of PDF files is to extract information from them. This paper describes an approach for extracting information from PDF files. The key idea is to transform the text information parsed from PDF files into semi-structured information by injecting additional uniform tags. An extensible rule set is built on tags. and other knowledge. Guided by the rules, one pattern matching algorithm based on a tree model is applied to obtain the necessary information. A further experiment proved that this method was effective. © Springer-Verlag Berlin Heidelberg 2006.

Cite

CITATION STYLE

APA

Yuan, F., Liu, B., & Yu, G. (2006). A study on information extraction from PDF files. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 3930 LNAI, pp. 258–267). Springer Verlag. https://doi.org/10.1007/11739685_27

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free