Significance of an accurate Sandhi-splitter in shallow parsing of Dravidian languages

V. V. Devadath; Dipti Misra Sharma

Conference ProceedingsOPEN ACCESS

Significance of an accurate Sandhi-splitter in shallow parsing of Dravidian languages

Proceedings of the Annual Meeting of the Association for Computational Linguistics (2016) 2016-August 37-42

DOI: 10.18653/v1/p16-3006

0Citations

75Readers

Abstract

This paper evaluates the challenges involved in shallow parsing of Dravidian languages which are highly agglutinative and morphologically rich. Text processing tasks in these languages are not trivial because multiple words concatenate to form a single string with morpho-phonemic changes at the point of concatenation. This phenomenon known as Sandhi, in turn complicates the individual word identification. Shallow parsing is the task of identification of correlated group of words given a raw sentence. The current work is an attempt to study the effect of Sandhi in building shallow parsers for Dravidian languages by evaluating its effect on Malayalam, one of the main languages from Dravidian family. We provide an in-depth analysis of effect of Sandhi in developing a robust shallow parser pipeline with experimental results emphasizing on how sensitive the individual components of shallow parser are, towards the accuracy of a sandhi splitter. Our work can serve as a guiding light for building robust text processing systems in Dravidian languages.

Cite

CITATION STYLE

APA

Devadath, V. V., & Sharma, D. M. (2016). Significance of an accurate Sandhi-splitter in shallow parsing of Dravidian languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2016-August, pp. 37–42). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-3006

Significance of an accurate Sandhi-splitter in shallow parsing of Dravidian languages

Abstract

Cite

Register to see more suggestions