CALAM: Linguistic structure to annotate handwritten text image corpus

0Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

In this paper, we report our effort in building a multi linguistic structure Cursive and Language Adaptive Methodology (CALAM) to create, annotate and validate linguistic dataset. CALAM provides a way for fetching and retrieval of information in a scientific and systematic manner through design and development of an annotated corpus of handwritten text image. It is a useful tool to annotate multi-lingual handwritten image dataset (Hindi, English, and Urdu etc.). The annotation is not limited with the grammatical tagging, but structural markup is also done. Annotation of handwritten text image is done in a hierarchical manner starting from handwritten form to segmented lines, words, and components. The component level markup is useful for finding strokes and list of ligatures in Urdu language. Along with a hierarchical access structure, CALAM provides the functionalities of Indexing, Insertion, Searching and Deletion of words and phrases in handwritten form. Apart from dataset fetching and retrieval it also automatically generates XML tagged file for each annotated handwritten text image for all dataset.

Cite

CITATION STYLE

APA

Choudhary, P., & Nain, N. (2015). CALAM: Linguistic structure to annotate handwritten text image corpus. In Smart Innovation, Systems and Technologies (Vol. 32, pp. 449–460). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-81-322-2208-8_41

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free