In this paper, we report our effort in building a multi linguistic structure Cursive and Language Adaptive Methodology (CALAM) to create, annotate and validate linguistic dataset. CALAM provides a way for fetching and retrieval of information in a scientific and systematic manner through design and development of an annotated corpus of handwritten text image. It is a useful tool to annotate multi-lingual handwritten image dataset (Hindi, English, and Urdu etc.). The annotation is not limited with the grammatical tagging, but structural markup is also done. Annotation of handwritten text image is done in a hierarchical manner starting from handwritten form to segmented lines, words, and components. The component level markup is useful for finding strokes and list of ligatures in Urdu language. Along with a hierarchical access structure, CALAM provides the functionalities of Indexing, Insertion, Searching and Deletion of words and phrases in handwritten form. Apart from dataset fetching and retrieval it also automatically generates XML tagged file for each annotated handwritten text image for all dataset.
CITATION STYLE
Choudhary, P., & Nain, N. (2015). CALAM: Linguistic structure to annotate handwritten text image corpus. In Smart Innovation, Systems and Technologies (Vol. 32, pp. 449–460). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-81-322-2208-8_41
Mendeley helps you to discover research relevant for your work.