Layout analysis-extraction of text lines from a document image and identification of their reading order-is an important step in converting the document into a searchable electronic representation. Projection methods are typically employed for extraction of text lines in Arabic script documents. Although projection methods achieve good accuracy on clean, skew-free documents, their performance drops under challenging situations (border noise, skew, complex layouts , etc.). This chapter presents a layout analysis system for extracting text lines in reading order from scanned Arabic script document images written in different languages (Arabic, Urdu, Persian, etc.) and different styles (Naskh, Nastaliq, etc.). The presented system is based on a suitable combination of different well-established techniques for analyzing Latin script documents that have proven to be robust against different types of document image degradations.
CITATION STYLE
Bukhari, S. S., Shafait, F., & Breuel, T. M. (2012). Layout Analysis of Arabic Script Documents. In Guide to OCR for Arabic Scripts (pp. 35–53). Springer London. https://doi.org/10.1007/978-1-4471-4072-6_2
Mendeley helps you to discover research relevant for your work.