Urdu word spotting is among the most challenging tasks in image processing and word spotting of hand written Urdu text is even more so. When it comes to handwritten Urdu documents, variation among the same words of various writers is significant. The orientation and style of the handwriting makes it really challenging for a word spotting system to correctly recognize the instances of the keyword. In this research, we tend to overcome this hurdle. We propose a system that takes a database of hand written Urdu text and generates random, yet, similar images to improve the classifier's ability to recognize variations caused by difference in handwriting. For image generation, we used geometric transformations and variants of Generative Adversarial Network (GAN). For the word spotting process, Histogram of Oriented Gradients (HOG) features are extracted from ligature images and then used to train a Long Short-Term Memory (LSTM) network for the classification task. This is the first study that focuses on improving word spotting by generating arbitrary samples using GANs and its variants. The system achieved a promising recognition rate of 98.96% due to the sample generation using Cycle-GANs.
CITATION STYLE
Farooqui, F. F., Hassan, M., Younis, M. S., & Siddhu, M. K. (2020). Offline Hand Written Urdu Word Spotting Using Random Data Generation. IEEE Access, 8, 131119–131136. https://doi.org/10.1109/ACCESS.2020.3010166
Mendeley helps you to discover research relevant for your work.