Discourse tagging for Indian languages

3Citations
Citations of this article
1Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Indian Language Discourse Project is to develop large corpus annotated with various types of discourse relations which are explicit and implicit. As an initial step towards it we have annotated corpus in three languages, Hindi, Tamil and Malayalam belonging to the two major language families in India- Indo Aryan and Dravidian. In this paper we describe our initial experiments in annotating all the three language corpus and the domains of the corpus belongs to health. The initial experiment brought out various types of discourse connectives in the three languages and how they vary amongst the languages. The preliminary study itself revealed that there is cross linguistic variation among the three languages. We have shown the inter annotator agreement for all the three languages. © 2014 Springer-Verlag Berlin Heidelberg.

Cite

CITATION STYLE

APA

Lalitha Devi, S., Lakshmi, S., & Gopalan, S. (2014). Discourse tagging for Indian languages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8403 LNCS, pp. 469–480). Springer Verlag. https://doi.org/10.1007/978-3-642-54906-9_38

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free