The State of the Art of Document Image Degradation Modelling

  • Baird H
N/ACitations
Citations of this article
31Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This chapter reviews the literature and the scientific and engineering state of the art of models of document-image degradation. Images of paper documents are almost inevitably degraded in the course of printing, photocopying , faxing, scanning, and the like. This loss of quality, even when it appears negligible to human eyes, can cause an abrupt drop in accuracy by the present generation of text recognition (OCR) systems. This fragility of OCR systems due to low-image quality is well known by serious users as well as OCR engineers and has been illustrated compellingly in large-scale experiments carried out at the Information Science Research Institute of the University of Nevada ([55] through [53]). In addition, there is growing evidence that significant improvement in accuracy on recalcitrant image pattern recognition problems now depends as much on the size and representativeness of training sets as on choice of features and classification algorithms. To mention only one example, a US National Institute of Standards and Technology (NIST) competition on hand-printed digits [67] had a surprising outcome: the competitor with the highest accuracy ignored the training set offered by NIST, using instead its own, much larger, set; furthermore , in spite of widely divergent algorithms, most of the competitors who used the same training set were tightly clustered in accuracy; and, one of the most promising attacks relied on perhaps the oldest and simplest of algorithms, nearest-neighbour classification [58]. These observations suggest that large improvements in accuracy may be achievable through-and perhaps only through-deeper scientific understanding of image quality and the representativeness of image data sets. Such a research programme may be expected to assist engineers by allowing

Cite

CITATION STYLE

APA

Baird, H. S. (2007). The State of the Art of Document Image Degradation Modelling (pp. 261–279). https://doi.org/10.1007/978-1-84628-726-8_12

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free