We present PWDB_13, a Word-level printed document image corpus from thirteen official Indic scripts, which consists of 26,000 words with equal distribution of each of the thirteen script types, collected by an automated process. A realistic classification framework based on four major regions of India has been proposed which represent the work as a unique one. Benchmarking is done with respect to PSI or printed script identification problem as it is very relevant in multi-script scenario. The result is said to be impressive observing the volume of the corpus and intrinsic complexities of Indic scripts. PWDB_13 will bridge the gap of unavailability of a complete document image dataset on all official Indic scripts and freely available to the researchers for noncommercial use.
CITATION STYLE
Obaidullah, S. M., Halder, C., Das, N., & Roy, K. (2016). PWDB_13: A corpus of word-level printed document images from thirteen official indic scripts. In Advances in Intelligent Systems and Computing (Vol. 404, pp. 233–242). Springer Verlag. https://doi.org/10.1007/978-81-322-2695-6_21
Mendeley helps you to discover research relevant for your work.