A crowdsource-based approach for preparing bangla POS tagged corpus

Shamim Ehsan; Sadia Tasnim Swarna; Sabir Ismail

Conference Proceedings

A crowdsource-based approach for preparing bangla POS tagged corpus

Advances in Intelligent Systems and Computing (2019) 814 463-473

DOI: 10.1007/978-981-13-1501-5_40

0Citations

2Readers

Get full text

Abstract

Automated Parts of Speech Tagging plays a vital role in the natural language processing. For computational Bangla Language Processing, we do not have large-scale Parts of Speech tagged corpus. There are two basic approaches to implement a corpus, by written rules or automated. To implement a rule-based corpus, we need experts in Bangla linguistics and it is also time-consuming. And for the automated corpus, we need a trained corpus, which is currently not available. Crowdsourcing can be served a vital role to fulfill these two requirements. So, in this paper, we proposed a crowd source-based approach to building Bangla Parts of Speech tagged corpus. We have used a standard tag set for Bangla. Raw documents are collected from various newspapers, books, and online site. We first give some example of Parts of Speech and then provide data to people for crowdsourcing. Finally, we analyze the result of the data, and its accuracy is 95%.

Cite

CITATION STYLE

APA

Ehsan, S., Swarna, S. T., & Ismail, S. (2019). A crowdsource-based approach for preparing bangla POS tagged corpus. In Advances in Intelligent Systems and Computing (Vol. 814, pp. 463–473). Springer Verlag. https://doi.org/10.1007/978-981-13-1501-5_40

A crowdsource-based approach for preparing bangla POS tagged corpus

Abstract

Cite

Register to see more suggestions