High Performance Document Layout Analysis

  • Breuel T
N/ACitations
Citations of this article
71Readers
Mendeley users who have this article in their library.

Abstract

In this paper, I summarize research in document layout analysis carried out over the last few years in our laboratory. Correct document layout analy- sis is a key step in document capture conversions into electronic formats, optical character recognition (OCR), information retrieval from scanned docu- ments, appearance-based document retrieval, and re- formatting of documents for on-screen display. We have developed a number of novel geometric algo- rithms and statistical methods. Layout analysis sys- tems built from these algorithms are applicable to a wide variety of languages and layouts, and have proven to be robust to the presence of noise and spu- rious features in a page image. The system itself consists of reusable and independent software mod- ules that can be reconfigured to be adapted to dif- ferent languages and applications. Currently, we are using them for electronic book and document capture applications. If there is commercial or government demand, we are interested in adapting these tools to information retrieval and intelligence applications.

Cite

CITATION STYLE

APA

Breuel, T. M. (2003). High Performance Document Layout Analysis. Proceedings 2003 Symposium on Document Image Understanding Technology, 03, 209–218. Retrieved from http://books.google.com/books?hl=en&lr=&id=Rw7f-vuaX7IC&oi=fnd&pg=PA209&dq=High+performance+document+layout+analysis&ots=LtngWLoU_a&sig=nP4wmQveoTytxQIy2aUdBgKj0mQ

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free