Layout Analysis of Arabic Script Documents

  • Bukhari S
  • Shafait F
  • Breuel T
N/ACitations
Citations of this article
17Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Layout analysis-extraction of text lines from a document image and identification of their reading order-is an important step in converting the document into a searchable electronic representation. Projection methods are typically employed for extraction of text lines in Arabic script documents. Although projection methods achieve good accuracy on clean, skew-free documents, their performance drops under challenging situations (border noise, skew, complex layouts , etc.). This chapter presents a layout analysis system for extracting text lines in reading order from scanned Arabic script document images written in different languages (Arabic, Urdu, Persian, etc.) and different styles (Naskh, Nastaliq, etc.). The presented system is based on a suitable combination of different well-established techniques for analyzing Latin script documents that have proven to be robust against different types of document image degradations.

Cite

CITATION STYLE

APA

Bukhari, S. S., Shafait, F., & Breuel, T. M. (2012). Layout Analysis of Arabic Script Documents. In Guide to OCR for Arabic Scripts (pp. 35–53). Springer London. https://doi.org/10.1007/978-1-4471-4072-6_2

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free