Document image analysis: What is missing?

George Nagy

Conference ProceedingsOPEN ACCESS

Document image analysis: What is missing?

Nagy G

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (1995) 974 576-587

DOI: 10.1007/3-540-60298-4_317

0Citations

8Readers

Abstract

The conversion of documents into electronic form has proved more difficult than anticipated. Document image analysis still accounts for only a small fraction of the rapidly-expanding document imaging market. Nevertheless, the optimism manifested over the last thirty years has not dissipated. Driven partly by document distribution on CD-ROM and via the World Wide Web, there is more interest in the preservation of layout and format attributes to increase legibility (sometimes called “page reconstruction”) rather than just text/non-text separation. The realization that accurate document image analysis requires fairly specific pre-stored information has resulted in the investigation of new data structures for knowledge bases and for the representation of the results of partial analysis. At the same time, the requirements of downstream software, such as word processing, information retrieval and computer-aided design applications, favor turning the results of the analysis and recognition into some standard format like SGML or DXF. There is increased emphasis on large-scale, automated comparative evaluation, using laboriously compiled test databases. The cost of generating these databases has stimulated new research on synthetic noise models. According to recent publications, the accurate conversion of business letters, technical reports, large typeset repositories like patents, postal addresses, specialized line drawings, and office forms containing a mix of handprinted, handwritten and printed material, is finally on the verge of success.

Cite

CITATION STYLE

APA

Nagy, G. (1995). Document image analysis: What is missing? In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 974, pp. 576–587). Springer Verlag. https://doi.org/10.1007/3-540-60298-4_317

Document image analysis: What is missing?

Abstract

Cite

Register to see more suggestions