Design and Validation of the Novel Perso-Arabic Database in Shahmukhi Punjabi Script

0Citations
Citations of this article
7Readers
Mendeley users who have this article in their library.

Abstract

This paper presents the design and validation of a novel dataset (SMHaroof) for printed alphabets in the Shamukhi script of Punjabi, a context-specific language of the Perso-Arabic family. The dataset is a novel addition to computational linguistics, artificial intelligence, pattern recognition, and optical character recognition research work. The dataset with subcategories, variants, and versions is publicly available for non-commercial research and academic use. The SMHaroof dataset is the first of its kind, designed in multiple categories of isolated context-specific forms of the characters in two different fonts, Nasta’leeq and Nask. It is available in grayscale, bitonal, and RGB versions, comprising 66728 (56744 + 9984) images. Multiple artificial neural networks (ANNs) and machine learning techniques were used to validate the dataset. A computer program has been developed to automatically generate the dataset with a user control data augmentation feature. The dataset auto-generation procedure described in this research is universal and applicable to other language scripts. The validation results range from 74% to 92% with different techniques.

Cite

CITATION STYLE

APA

Rafique, H., & Javid, T. (2025). Design and Validation of the Novel Perso-Arabic Database in Shahmukhi Punjabi Script. Data Intelligence, 7(3), 745–775. https://doi.org/10.3724/2096-7004.di.2025.0035

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free