Indian Language Discourse Project is to develop large corpus annotated with various types of discourse relations which are explicit and implicit. As an initial step towards it we have annotated corpus in three languages, Hindi, Tamil and Malayalam belonging to the two major language families in India- Indo Aryan and Dravidian. In this paper we describe our initial experiments in annotating all the three language corpus and the domains of the corpus belongs to health. The initial experiment brought out various types of discourse connectives in the three languages and how they vary amongst the languages. The preliminary study itself revealed that there is cross linguistic variation among the three languages. We have shown the inter annotator agreement for all the three languages. © 2014 Springer-Verlag Berlin Heidelberg.
CITATION STYLE
Lalitha Devi, S., Lakshmi, S., & Gopalan, S. (2014). Discourse tagging for Indian languages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8403 LNCS, pp. 469–480). Springer Verlag. https://doi.org/10.1007/978-3-642-54906-9_38
Mendeley helps you to discover research relevant for your work.