BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali)

10Citations
Citations of this article
39Readers
Mendeley users who have this article in their library.

Abstract

Recognizing biomedical entities in the text has significance in biomedical and health science research, as it benefits myriad downstream tasks, including entity linking, relation extraction, or entity resolution. While English and a few other widely used languages enjoy ample resources for automatic biomedical entity recognition, it is not the case for Bangla, a low-resource language. On that account, in this paper, we introduce BanglaBioMed, a Bangla biomedical named entity (NE) annotated dataset in standard IOB format, the first of its kind, consisting of over 12000 tokens annotated with the biomedical entities. The corpus is created by collecting Bangla text from a list of health articles and then annotated with four distinct types of entities: Anatomy (AN), Chemical and Drugs (CD), Disease and Symptom (DS), and Medical Procedure (MP). We provide the details of the entire data collection and annotation procedure and illustrate various statistics of the created corpus. Our developed corpus is a much-needed addition to the Bangla NLP resource that will facilitate biomedical NLP research in Bangla.

Cite

CITATION STYLE

APA

Sazzed, S. (2022). BanglaBioMed: A Biomedical Named-Entity Annotated Corpus for Bangla (Bengali). In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 323–329). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.bionlp-1.31

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free