UOHTD: Urdu Offline Handwritten Text Dataset

1Citations
Citations of this article
3Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present our Offline Urdu Handwritten Text Dataset (UOHTD) in this paper by collecting 800 Urdu handwritten samples written by 800 native language writers. It consists of images in the form of a dataset containing written text samples scanned with multiple spatial resolutions. 8000 text lines and 40000 words as patches have been extracted from sample pages and checked manually and formally using a ground truth database. Machine Learning Tools have been utilized to extract sample pages and segment them into lines and words. Initial trials on demographic (gender and age group) classification of Urdu writers with samples of Offline Urdu Handwritten Text Dataset (UOHTD) has produced promising results (85% for gender and 79% for age group classification) using CNNs. The database would be made available to the researcher worldwide for study into various handwritten-related topics including text recognition, identification of the writer’s age, ethnicity, demographics, gender, and handedness, as well as verification.

Cite

CITATION STYLE

APA

Rafique, A., & Ishtiaq, M. (2022). UOHTD: Urdu Offline Handwritten Text Dataset. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13639 LNCS, pp. 498–511). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21648-0_34

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free