CALAM: Linguistic structure to annotate handwritten text image corpus

Prakash Choudhary; Neeta Nain

Conference Proceedings

CALAM: Linguistic structure to annotate handwritten text image corpus

Smart Innovation, Systems and Technologies (2015) 32 449-460

DOI: 10.1007/978-81-322-2208-8_41

0Citations

5Readers

Get full text

Abstract

In this paper, we report our effort in building a multi linguistic structure Cursive and Language Adaptive Methodology (CALAM) to create, annotate and validate linguistic dataset. CALAM provides a way for fetching and retrieval of information in a scientific and systematic manner through design and development of an annotated corpus of handwritten text image. It is a useful tool to annotate multi-lingual handwritten image dataset (Hindi, English, and Urdu etc.). The annotation is not limited with the grammatical tagging, but structural markup is also done. Annotation of handwritten text image is done in a hierarchical manner starting from handwritten form to segmented lines, words, and components. The component level markup is useful for finding strokes and list of ligatures in Urdu language. Along with a hierarchical access structure, CALAM provides the functionalities of Indexing, Insertion, Searching and Deletion of words and phrases in handwritten form. Apart from dataset fetching and retrieval it also automatically generates XML tagged file for each annotated handwritten text image for all dataset.

Author supplied keywords

Cite

CITATION STYLE

APA

Choudhary, P., & Nain, N. (2015). CALAM: Linguistic structure to annotate handwritten text image corpus. In Smart Innovation, Systems and Technologies (Vol. 32, pp. 449–460). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-81-322-2208-8_41

CALAM: Linguistic structure to annotate handwritten text image corpus

Abstract

Author supplied keywords

Cite

Register to see more suggestions