Recognition of cursive video text using a deep learning framework

13Citations
Citations of this article
27Readers
Mendeley users who have this article in their library.
Get full text

Abstract

This study focuses on cursive text recognition appearing in videos, using a complete framework of deep neural networks. While mature video optical character recognition systems (V-OCRs) are available for text in non-cursive scripts, recognition of cursive scripts is marked by many challenges. These include complex and overlapping ligatures, contextdependent shape variations and presence of a large number of dots and diacritics. The authors present an analytical technique for recognition of cursive caption text that relies on a combination of convolutional and recurrent neural networks trained in an end-to-end framework. Text lines extracted from video frames are preprocessed to segment the background and are fed to a convolutional neural network for feature extraction. The extracted feature sequences are fed to different variants of bi-directional recurrent neural networks along with the ground truth transcription to learn sequence-to-sequence mapping. Finally, a connectionist temporal classification layer is employed to produce the final transcription. Experiments on a data set of more than 40,000 text lines from 11,192 video frames of various News channel videos reported an overall character recognition rate of 97.63%. The proposed work employs Urdu text as a case study but the findings can be generalised to other cursive scripts as well.

Cite

CITATION STYLE

APA

Mirza, A., & Siddiqi, I. (2020). Recognition of cursive video text using a deep learning framework. IET Image Processing, 14(14), 3444–3455. https://doi.org/10.1049/iet-ipr.2019.1070

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free