Statistical sandhi splitter for agglutinative languages

6Citations
Citations of this article
5Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Sandhi splitting is a primary and an important step for any natural language processing (NLP) application for languages which have agglutinative morphology. This paper presents a statistical approach to build a sandhi splitter for agglutinative languages. The input to the model is a valid string in the language and the output is a split of that string into meaningful word/s. The approach adopted comprises of two stages namely Segmentation and Word generation, both of which use conditional random fields (CRFs). Our approach is robust and language independent. The results for two Dravidian languages viz. Telugu and Malayalam show an accuracy of 89.07% and 90.50% respectively.

Cite

CITATION STYLE

APA

Kuncham, P., Nelakuditi, K., Nallani, S., & Mamidi, R. (2015). Statistical sandhi splitter for agglutinative languages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9041, pp. 164–172). Springer Verlag. https://doi.org/10.1007/978-3-319-18111-0_13

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free