Development of biomedical corpus enlargement platform using BERT for bio-entity recognition

2Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.
Get full text

Abstract

As the volume and availability of textual data dramatically increase in the current digital age, a major challenge is how to properly extract useful information online. A key component of the text mining pipeline is named entity recognition (NER) for extracting knowledge. Currently, there are many publicly available NER tools such as Stanford NLP, NLTK or Spacy python library. However, there is a problem of accurate unknown entity recognition. We focus on using deep learning for recognizing entities, as it has been shown to outperform traditional algorithms for big data in part of its ability for feature extraction and dealing with multi-dimensionality. In this paper, we applied the state-of-the-art language representation model termed BERT (Bidirectional Encoder Representations from Transformers) for NER classification, in order to enlarge the existing biomedical corpus for further machine learning processing. We used additional biomedical corpora for training, and then compared the results to a recent prior work. The end result is precision improvement of 2.24%, recall improvement of 3.55%, and F1-score improvement of 2.98%, in protein recognition of super-pathway of leucine, valine, and isoleucine biosynthesis. We also developed a prototype, in form of an internal web platform, for supporting bio-annotators and corpus enlargement purpose.

Cite

CITATION STYLE

APA

Phongwattana, T., & Chan, J. H. (2019). Development of biomedical corpus enlargement platform using BERT for bio-entity recognition. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11953 LNCS, pp. 454–463). Springer. https://doi.org/10.1007/978-3-030-36708-4_37

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free