UOHTD: Urdu Offline Handwritten Text Dataset

Aftab Rafique; M. Ishtiaq

Conference Proceedings

UOHTD: Urdu Offline Handwritten Text Dataset

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13639 LNCS 498-511

DOI: 10.1007/978-3-031-21648-0_34

1Citations

3Readers

Get full text

Abstract

We present our Offline Urdu Handwritten Text Dataset (UOHTD) in this paper by collecting 800 Urdu handwritten samples written by 800 native language writers. It consists of images in the form of a dataset containing written text samples scanned with multiple spatial resolutions. 8000 text lines and 40000 words as patches have been extracted from sample pages and checked manually and formally using a ground truth database. Machine Learning Tools have been utilized to extract sample pages and segment them into lines and words. Initial trials on demographic (gender and age group) classification of Urdu writers with samples of Offline Urdu Handwritten Text Dataset (UOHTD) has produced promising results (85% for gender and 79% for age group classification) using CNNs. The database would be made available to the researcher worldwide for study into various handwritten-related topics including text recognition, identification of the writer’s age, ethnicity, demographics, gender, and handedness, as well as verification.

Author supplied keywords

Cite

CITATION STYLE

APA

Rafique, A., & Ishtiaq, M. (2022). UOHTD: Urdu Offline Handwritten Text Dataset. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13639 LNCS, pp. 498–511). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21648-0_34

UOHTD: Urdu Offline Handwritten Text Dataset

Abstract

Author supplied keywords

Cite

Register to see more suggestions