Publicly available dataset is important for character, word or document recognition. The use of a standardized dataset will provide a fair or reliable comparison between the performances of the underlying recognition algorithms. Research on Brahmi words recognition had achieved encouraging results. However, there is no publicly available standardized Brahmi dataset. In this paper, the steps in producing a publicly available Brahmi dataset are presented. These steps include data collection, segmentation, storage, labeling, and statistical distribution. A total of 7,011 images of Brahmi characters were collected. The collected dataset is divided into three classes: vowel, consonants, and compound characters. In total, there are 170 classes with 4 of these classes belong to vowels, 27 classes of consonants, and 139 classes of compound characters. The 170 classes of characters are further divided into training and testing sets; 6,475 images in the training set while 536 images in the testing set.
CITATION STYLE
Gautam, N., Chai, S. S., & Gautam, M. (2020). The Dataset for Printed Brahmi Word Recognition. In Lecture Notes in Networks and Systems (Vol. 106, pp. 125–133). Springer. https://doi.org/10.1007/978-981-15-2329-8_13
Mendeley helps you to discover research relevant for your work.