PWDB_13: A corpus of word-level printed document images from thirteen official indic scripts

2Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

We present PWDB_13, a Word-level printed document image corpus from thirteen official Indic scripts, which consists of 26,000 words with equal distribution of each of the thirteen script types, collected by an automated process. A realistic classification framework based on four major regions of India has been proposed which represent the work as a unique one. Benchmarking is done with respect to PSI or printed script identification problem as it is very relevant in multi-script scenario. The result is said to be impressive observing the volume of the corpus and intrinsic complexities of Indic scripts. PWDB_13 will bridge the gap of unavailability of a complete document image dataset on all official Indic scripts and freely available to the researchers for noncommercial use.

Cite

CITATION STYLE

APA

Obaidullah, S. M., Halder, C., Das, N., & Roy, K. (2016). PWDB_13: A corpus of word-level printed document images from thirteen official indic scripts. In Advances in Intelligent Systems and Computing (Vol. 404, pp. 233–242). Springer Verlag. https://doi.org/10.1007/978-81-322-2695-6_21

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free