Comparing and combining some popular NER approaches on Biomedical tasks

0Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

We compare three simple and popular approaches for NER: 1) SEQ (sequence-labeling with a linear token classifier) 2) SeqCRF (sequence-labeling with Conditional Random Fields), and 3) SpanPred (span-prediction with boundary token embeddings). We compare the approaches on 4 biomedical NER tasks: GENIA, NCBI-Disease, LivingNER (Spanish), SocialDisNER (Spanish). The SpanPred model demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 1.3 and 0.6 F1 respectively. The SeqCRF model also demonstrates state-of-the-art performance on LivingNER and SocialDisNER, improving F1 by 0.2 F1 and 0.7 respectively. The SEQ model is competitive with the state-of-the-art on the LivingNER dataset. We explore some simple ways of combining the three approaches. We find that majority voting consistently gives high precision and high F1 across all 4 datasets. Lastly, we implement a system that learns to combine the predictions of SEQ and SpanPred, generating systems that consistently give high recall and high F1 across all 4 datasets. On the GENIA dataset, we find that our learned combiner system significantly boosts F1(+1.2) and recall(+2.1) over the systems being combined. We release all the well-documented code necessary to reproduce all systems at this Github repository.

Cite

CITATION STYLE

APA

Verma, H., Bergler, S., & Tahaei, N. (2023). Comparing and combining some popular NER approaches on Biomedical tasks. In Proceedings of the Annual Meeting of the Association for Computational Linguistics (pp. 273–279). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2023.bionlp-1.24

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free