Abstract
Lexicon is in important resource in any kind of language processing application. Corpus-based lexica have several advantages over other traditional approaches. The lexicon developed for Sinhala was based on the text obtained from a corpus of 10 million words drawn from diverse genres. The words extracted from the corpus have been labeled with parts of speech categories defined according to a novel classification proposed for Sinhala. The lexicon reports 80% coverage over unrestricted text obtained from online sources. The lexicon has been implemented in Lexical Mark up Framework.
Cite
CITATION STYLE
Weerasinghe, R., Herath, D., & Welgama, V. (2009). Corpus-based Sinhala Lexicon. In Proceedings of the 7th Workshop on Asian Language Resources, ALR 2009 - in conjunction with the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (pp. 17–23). Association for Computational Linguistics (ACL). https://doi.org/10.3115/1690299.1690302
Register to see more suggestions
Mendeley helps you to discover research relevant for your work.