Discourse tagging for Indian languages

Sobha Lalitha Devi; S. Lakshmi; Sindhuja Gopalan

Conference Proceedings

Discourse tagging for Indian languages

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2014) 8403 LNCS(PART 1) 469-480

DOI: 10.1007/978-3-642-54906-9_38

3Citations

1Readers

Get full text

Abstract

Indian Language Discourse Project is to develop large corpus annotated with various types of discourse relations which are explicit and implicit. As an initial step towards it we have annotated corpus in three languages, Hindi, Tamil and Malayalam belonging to the two major language families in India- Indo Aryan and Dravidian. In this paper we describe our initial experiments in annotating all the three language corpus and the domains of the corpus belongs to health. The initial experiment brought out various types of discourse connectives in the three languages and how they vary amongst the languages. The preliminary study itself revealed that there is cross linguistic variation among the three languages. We have shown the inter annotator agreement for all the three languages. © 2014 Springer-Verlag Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Lalitha Devi, S., Lakshmi, S., & Gopalan, S. (2014). Discourse tagging for Indian languages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 8403 LNCS, pp. 469–480). Springer Verlag. https://doi.org/10.1007/978-3-642-54906-9_38

Discourse tagging for Indian languages

Abstract

Author supplied keywords

Cite

Register to see more suggestions