Abstract
This paper evaluates the challenges involved in shallow parsing of Dravidian languages which are highly agglutinative and morphologically rich. Text processing tasks in these languages are not trivial because multiple words concatenate to form a single string with morpho-phonemic changes at the point of concatenation. This phenomenon known as Sandhi, in turn complicates the individual word identification. Shallow parsing is the task of identification of correlated group of words given a raw sentence. The current work is an attempt to study the effect of Sandhi in building shallow parsers for Dravidian languages by evaluating its effect on Malayalam, one of the main languages from Dravidian family. We provide an in-depth analysis of effect of Sandhi in developing a robust shallow parser pipeline with experimental results emphasizing on how sensitive the individual components of shallow parser are, towards the accuracy of a sandhi splitter. Our work can serve as a guiding light for building robust text processing systems in Dravidian languages.
Cite
CITATION STYLE
Devadath, V. V., & Sharma, D. M. (2016). Significance of an accurate Sandhi-splitter in shallow parsing of Dravidian languages. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (Vol. 2016-August, pp. 37–42). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/p16-3006
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.