Benchmarking Strategy for Arabic Screen-Rendered Word Recognition

  • Slimane F
  • Kanoun S
  • Hennebert J
  • et al.
N/ACitations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This chapter presents a new benchmarking strategy for Arabic screen-based word recognition. Firstly, we report on the creation of the new APTI (Ara-bic Printed Text Image) database. This database is a large-scale benchmarking of open-vocabulary, multi-font, multi-size and multi-style word recognition systems in Arabic. Such systems take as input a text image and compute as output a character string corresponding to the text included in the image. The challenges that are addressed by the database are in the variability of the sizes, fonts and styles used to generate the images. A focus is also given on low resolution images where anti-aliasing is generating noise on the characters being recognized. The database contains 45,313,600 single word images totalling more than 250 million characters. Ground truth annotation is provided for each image from an XML file. The annotation includes the number of characters, the number of pieces of Arabic words (PAWs), the sequence of characters, the size, the style, the font used to generate each image, etc. Secondly, we describe the Arabic Recognition Competition: Multi-Font Multi-Size Digitally Represented Text held in the

Cite

CITATION STYLE

APA

Slimane, F., Kanoun, S., Hennebert, J., Ingold, R., & Alimi, A. M. (2012). Benchmarking Strategy for Arabic Screen-Rendered Word Recognition. In Guide to OCR for Arabic Scripts (pp. 423–450). Springer London. https://doi.org/10.1007/978-1-4471-4072-6_18

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free