Document image analysis is the field of converting paper documents into an editable electronic representation by performing optical character recognition (OCR). In recent years, there has been a tremendous amount of progress in the development of open source OCR systems. The tesseract-ocr engine, as was the HP Research Prototype in the UNLV Fourth Annual Test of OCR Accuracy, is described in a comprehensive overview. Emphasis is placed on aspects that are novel or at least unusual in an OCR engine, including in particular the line finding, features/classification methods, and the adaptive classifier. OCRopus is one of the leading open source document analysis systems using tesseract-ocr with a modular and pluggable architecture. Imagemagick is an open source image processing tool. This paper presents an overview of different steps involved in a document image analysis system and illustrates them with examples from Combination of imagemagick and OCRopus.
CITATION STYLE
M L, Prof. S., P J, Dr. A., & D N, S. (2016). Document Image Analysis Using Imagemagick and Tesseract-ocr. IARJSET, 3(5), 108–112. https://doi.org/10.17148/iarjset.2016.3523
Mendeley helps you to discover research relevant for your work.