PWDB_13: A corpus of word-level printed document images from thirteen official indic scripts

Sk Md Obaidullah; Chayan Halder; Nibaran Das; Kaushik Roy

Conference Proceedings

PWDB_13: A corpus of word-level printed document images from thirteen official indic scripts

Advances in Intelligent Systems and Computing (2016) 404 233-242

DOI: 10.1007/978-81-322-2695-6_21

2Citations

4Readers

Get full text

Abstract

We present PWDB_13, a Word-level printed document image corpus from thirteen official Indic scripts, which consists of 26,000 words with equal distribution of each of the thirteen script types, collected by an automated process. A realistic classification framework based on four major regions of India has been proposed which represent the work as a unique one. Benchmarking is done with respect to PSI or printed script identification problem as it is very relevant in multi-script scenario. The result is said to be impressive observing the volume of the corpus and intrinsic complexities of Indic scripts. PWDB_13 will bridge the gap of unavailability of a complete document image dataset on all official Indic scripts and freely available to the researchers for noncommercial use.

Author supplied keywords

Cite

CITATION STYLE

APA

Obaidullah, S. M., Halder, C., Das, N., & Roy, K. (2016). PWDB_13: A corpus of word-level printed document images from thirteen official indic scripts. In Advances in Intelligent Systems and Computing (Vol. 404, pp. 233–242). Springer Verlag. https://doi.org/10.1007/978-81-322-2695-6_21

PWDB_13: A corpus of word-level printed document images from thirteen official indic scripts

Abstract

Author supplied keywords

Cite

Register to see more suggestions