We present our Offline Urdu Handwritten Text Dataset (UOHTD) in this paper by collecting 800 Urdu handwritten samples written by 800 native language writers. It consists of images in the form of a dataset containing written text samples scanned with multiple spatial resolutions. 8000 text lines and 40000 words as patches have been extracted from sample pages and checked manually and formally using a ground truth database. Machine Learning Tools have been utilized to extract sample pages and segment them into lines and words. Initial trials on demographic (gender and age group) classification of Urdu writers with samples of Offline Urdu Handwritten Text Dataset (UOHTD) has produced promising results (85% for gender and 79% for age group classification) using CNNs. The database would be made available to the researcher worldwide for study into various handwritten-related topics including text recognition, identification of the writer’s age, ethnicity, demographics, gender, and handedness, as well as verification.
CITATION STYLE
Rafique, A., & Ishtiaq, M. (2022). UOHTD: Urdu Offline Handwritten Text Dataset. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13639 LNCS, pp. 498–511). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-21648-0_34
Mendeley helps you to discover research relevant for your work.