Statistical sandhi splitter for agglutinative languages

Prathyusha Kuncham; Kovida Nelakuditi; Sneha Nallani; Radhika Mamidi

Conference Proceedings

Statistical sandhi splitter for agglutinative languages

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2015) 9041 164-172

DOI: 10.1007/978-3-319-18111-0_13

6Citations

5Readers

Get full text

Abstract

Sandhi splitting is a primary and an important step for any natural language processing (NLP) application for languages which have agglutinative morphology. This paper presents a statistical approach to build a sandhi splitter for agglutinative languages. The input to the model is a valid string in the language and the output is a split of that string into meaningful word/s. The approach adopted comprises of two stages namely Segmentation and Word generation, both of which use conditional random fields (CRFs). Our approach is robust and language independent. The results for two Dravidian languages viz. Telugu and Malayalam show an accuracy of 89.07% and 90.50% respectively.

Cite

CITATION STYLE

APA

Kuncham, P., Nelakuditi, K., Nallani, S., & Mamidi, R. (2015). Statistical sandhi splitter for agglutinative languages. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 9041, pp. 164–172). Springer Verlag. https://doi.org/10.1007/978-3-319-18111-0_13

Statistical sandhi splitter for agglutinative languages

Abstract

Cite

Register to see more suggestions