Building a hierarchical annotated corpus of Urdu: The URDU.KON-TB treebank

Qaiser Abbas

Conference Proceedings

Building a hierarchical annotated corpus of Urdu: The URDU.KON-TB treebank

Abbas Q

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2012) 7181 LNCS(PART 1) 66-79

DOI: 10.1007/978-3-642-28604-9_6

9Citations

11Readers

Get full text

Abstract

This work aims at the development of a representative treebank for the South Asian language Urdu. Urdu is a comparatively under resourced language and the development of a reliable treebank for Urdu will have significant impact on the state-of-the-art for Urdu language processing. In URDU.KON-TB treebank described here, a POS tagset, a syntactic tagset and a functional tagset have been proposed. The construction of the treebank is based on an existing corpus of 19 million words for the Urdu language. Part of speech (POS) tagging and annotation of a selected set of sentences from different sub-domains of this corpus is in process manually and the work performed till to date is presented here. The hierarchical annotation scheme we adopted has a combination of a phrase structure (PS) and a hybrid dependency structure (HDS). © 2012 Springer-Verlag.

Author supplied keywords

Cite

CITATION STYLE

APA

Abbas, Q. (2012). Building a hierarchical annotated corpus of Urdu: The URDU.KON-TB treebank. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 7181 LNCS, pp. 66–79). https://doi.org/10.1007/978-3-642-28604-9_6

Building a hierarchical annotated corpus of Urdu: The URDU.KON-TB treebank

Abstract

Author supplied keywords

Cite

Register to see more suggestions