Corpus-based Sinhala Lexicon

Ruvan Weerasinghe; Dulip Herath; Viraj Welgama

Conference Proceedings

Corpus-based Sinhala Lexicon

Proceedings of the 7th Workshop on Asian Language Resources, ALR 2009 - in conjunction with the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (2009) 17-23

DOI: 10.3115/1690299.1690302

6Citations

97Readers

Get full text

Abstract

Lexicon is in important resource in any kind of language processing application. Corpus-based lexica have several advantages over other traditional approaches. The lexicon developed for Sinhala was based on the text obtained from a corpus of 10 million words drawn from diverse genres. The words extracted from the corpus have been labeled with parts of speech categories defined according to a novel classification proposed for Sinhala. The lexicon reports 80% coverage over unrestricted text obtained from online sources. The lexicon has been implemented in Lexical Mark up Framework.

Cite

CITATION STYLE

APA

Weerasinghe, R., Herath, D., & Welgama, V. (2009). Corpus-based Sinhala Lexicon. In Proceedings of the 7th Workshop on Asian Language Resources, ALR 2009 - in conjunction with the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (pp. 17–23). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1690299.1690302

Corpus-based Sinhala Lexicon

Abstract

Cite

Register to see more suggestions