Layout Analysis of Arabic Script Documents

Syed Saqib Bukhari; Faisal Shafait; Thomas M. Breuel

Book Chapter

Layout Analysis of Arabic Script Documents

Bukhari S
Shafait F
Breuel T

Springer London, (2012), 35-53

DOI: 10.1007/978-1-4471-4072-6_2

N/ACitations

17Readers

Get full text

Abstract

Layout analysis-extraction of text lines from a document image and identification of their reading order-is an important step in converting the document into a searchable electronic representation. Projection methods are typically employed for extraction of text lines in Arabic script documents. Although projection methods achieve good accuracy on clean, skew-free documents, their performance drops under challenging situations (border noise, skew, complex layouts , etc.). This chapter presents a layout analysis system for extracting text lines in reading order from scanned Arabic script document images written in different languages (Arabic, Urdu, Persian, etc.) and different styles (Naskh, Nastaliq, etc.). The presented system is based on a suitable combination of different well-established techniques for analyzing Latin script documents that have proven to be robust against different types of document image degradations.

Cite

CITATION STYLE

APA

Bukhari, S. S., Shafait, F., & Breuel, T. M. (2012). Layout Analysis of Arabic Script Documents. In Guide to OCR for Arabic Scripts (pp. 35–53). Springer London. https://doi.org/10.1007/978-1-4471-4072-6_2

Layout Analysis of Arabic Script Documents

Abstract

Cite

Register to see more suggestions