Abstract
This paper presents the semi-semantic part of speech annotation and its evaluation via Krippendorff's α for the URDU.KON-TB treebank developed for the South Asian language Urdu. The part of speech annotation with the additional subcategories of morphology and semantics provides a treebank with sufficient encoded information. The corpus used is collected from the Urdu Wikipedia and news papers. The sentences were annotated manually to ensure a high an-notational quality. The inter-annotator agreement obtained after evaluation is 0.964, which lies in the range of perfect agreement on a scale. Urdu is comparatively an under-resourced language and the development of the treebank with rich part of speech annotation will have significant impact on the state-of-the-art for Urdu language processing.
Cite
CITATION STYLE
Abbas, Q. (2020). Semi-semantic part of speech annotation and evaluation. In LAW 2014 - 8th Linguistic Annotation Workshop, in conjunction with COLING 2014 - Proceedings of the Workshop (pp. 75–81). Association for Computational Linguistics (ACL). https://doi.org/10.3115/v1/w14-4911
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.