Document Image Analysis Using Imagemagick and Tesseract-ocr

Prof. Smitha M L; Dr. Antony P J; Sachin D N

Journal ArticleOPEN ACCESS

Document Image Analysis Using Imagemagick and Tesseract-ocr

M L P
P J D
D N S

IARJSET (2016) 3(5) 108-112

DOI: 10.17148/iarjset.2016.3523

N/ACitations

29Readers

Abstract

Document image analysis is the field of converting paper documents into an editable electronic representation by performing optical character recognition (OCR). In recent years, there has been a tremendous amount of progress in the development of open source OCR systems. The tesseract-ocr engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier. OCRopus is one of the leading open source document analysis systems using tesseract-ocr with a modular and pluggable architecture. Imagemagick is an open source image processing tool. This paper presents an overview of different steps involved in a document image analysis system and illustrates them with examples from Combination of imagemagick and OCRopus.

Cite

CITATION STYLE

APA

M L, Prof. S., P J, Dr. A., & D N, S. (2016). Document Image Analysis Using Imagemagick and Tesseract-ocr. IARJSET, 3(5), 108–112. https://doi.org/10.17148/iarjset.2016.3523

Document Image Analysis Using Imagemagick and Tesseract-ocr

Abstract

Cite

Register to see more suggestions