On OCR of degraded documents using fuzzy multifactorial analysis

U. Garain; B. B. Chaudhuri

Conference Proceedings

On OCR of degraded documents using fuzzy multifactorial analysis

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2002) 2275 388-394

DOI: 10.1007/3-540-45631-7_52

2Citations

6Readers

Get full text

Abstract

Optical Character Recognition (OCR) systems show poor performance while processing documents like old books or newspapers, Xerox materials, faxed documents, etc. Such documents are considered as degraded documents. One of the important reasons for poor recognition rate for degraded documents is existence of touching or connected characters, which create a major problem for designing an effective character segmentation procedure. In this paper, a new technique is proposed for segmentation of touching characters. The technique is based on fuzzy multifactorial analysis. A predictive algorithm is developed for effectively selecting cut-points to segment touching characters. Initially, our proposed method has been applied for segmenting touching characters that appear in Devnagari (Hindi) and Bangla, two major scripts in Indian sub-continent. The results obtained from a test-set of considerable size show that a high recognition rate can be achieved with a reasonable amount of computations.

Cite

CITATION STYLE

APA

Garain, U., & Chaudhuri, B. B. (2002). On OCR of degraded documents using fuzzy multifactorial analysis. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2275, pp. 388–394). Springer Verlag. https://doi.org/10.1007/3-540-45631-7_52

On OCR of degraded documents using fuzzy multifactorial analysis

Abstract

Cite

Register to see more suggestions