Separating indic scripts with ‘matra’—A precursor to script identification in multi-script documents

Sk Md Obaidullah; Chitrita Goswami; K. C. Santosh; Chayan Halder; Nibaran Das; Kaushik Roy

Conference Proceedings

Separating indic scripts with ‘matra’—A precursor to script identification in multi-script documents

Advances in Intelligent Systems and Computing (2017) 459 AISC 205-214

DOI: 10.1007/978-981-10-2104-6_19

6Citations

8Readers

Get full text

Abstract

Here, we present a new technique for separating Indic scripts based on matra (or shirorekha), where an optimized fractal geometry analysis (FGA) is used as the sole pertinent feature. Separating those scripts having matra from those which do not have one, can be used as a precursor to ease the subsequent script identification process. In our work, we consider two matra-based scripts namely Bangla and Devanagari as positive samples, and the counter samples are obtained from two different scripts namely Roman and Urdu. Altogether, we took 1204 document images with a distribution of 525 matra-based (325 Bangla and 200 Devanagari) and 679 without matra-based (370 Roman and 309 Urdu) scripts. For experimentation, we have used three different classifiers: multilayer perceptron (MLP), random forest (RF), and BayesNet (BN), with the target of selecting the best performer. From a series of test, we achieved an average accuracy of 96.44% from MLP classifier.

Author supplied keywords

Cite

CITATION STYLE

APA

Obaidullah, S. M., Goswami, C., Santosh, K. C., Halder, C., Das, N., & Roy, K. (2017). Separating indic scripts with ‘matra’—A precursor to script identification in multi-script documents. In Advances in Intelligent Systems and Computing (Vol. 459 AISC, pp. 205–214). Springer Verlag. https://doi.org/10.1007/978-981-10-2104-6_19

Separating indic scripts with ‘matra’—A precursor to script identification in multi-script documents

Abstract

Author supplied keywords

Cite

Register to see more suggestions