Benchmarking Strategy for Arabic Screen-Rendered Word Recognition

Fouad Slimane; Slim Kanoun; Jean Hennebert; Rolf Ingold; Adel M. Alimi

Book Chapter

Benchmarking Strategy for Arabic Screen-Rendered Word Recognition

Slimane F
Kanoun S
Hennebert J
et al.

Springer London, (2012), 423-450

DOI: 10.1007/978-1-4471-4072-6_18

N/ACitations

3Readers

Get full text

Abstract

This chapter presents a new benchmarking strategy for Arabic screen-based word recognition. Firstly, we report on the creation of the new APTI (Ara-bic Printed Text Image) database. This database is a large-scale benchmarking of open-vocabulary, multi-font, multi-size and multi-style word recognition systems in Arabic. Such systems take as input a text image and compute as output a character string corresponding to the text included in the image. The challenges that are addressed by the database are in the variability of the sizes, fonts and styles used to generate the images. A focus is also given on low resolution images where anti-aliasing is generating noise on the characters being recognized. The database contains 45,313,600 single word images totalling more than 250 million characters. Ground truth annotation is provided for each image from an XML file. The annotation includes the number of characters, the number of pieces of Arabic words (PAWs), the sequence of characters, the size, the style, the font used to generate each image, etc. Secondly, we describe the Arabic Recognition Competition: Multi-Font Multi-Size Digitally Represented Text held in the

Cite

CITATION STYLE

APA

Slimane, F., Kanoun, S., Hennebert, J., Ingold, R., & Alimi, A. M. (2012). Benchmarking Strategy for Arabic Screen-Rendered Word Recognition. In Guide to OCR for Arabic Scripts (pp. 423–450). Springer London. https://doi.org/10.1007/978-1-4471-4072-6_18

Benchmarking Strategy for Arabic Screen-Rendered Word Recognition

Abstract

Cite

Register to see more suggestions