TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla

1Citations
Citations of this article
30Readers
Mendeley users who have this article in their library.

Abstract

Biological and healthcare domains, artistic works, and organization names can all have nested, overlapping, discontinuous entity mentions that may be syntactically or semantically ambiguous in practice. Traditional sequence tagging algorithms are unable to recognize these complex mentions because they violate the assumptions upon which sequence tagging schemes are founded. In this paper, we describe our contribution to SemEval 2022 Task 11 on identifying such complex named entities. We leveraged an ensemble of ELECTRA-based models exclusively pretrained on the Bangla language with ELECTRA-based monolingual models pretrained on English to achieve competitive performance. Besides providing a system description, we also present the outcomes of our experiments on architectural decisions, dataset augmentations and post-competition findings.

Cite

CITATION STYLE

APA

Tasnim, N., Shihab, I., Sushmit, A. S., Bethard, S., & Sadeque, F. (2022). TEAM-Atreides at SemEval-2022 Task 11: On leveraging data augmentation and ensemble to recognize complex Named Entities in Bangla. In SemEval 2022 - 16th International Workshop on Semantic Evaluation, Proceedings of the Workshop (pp. 1524–1530). Association for Computational Linguistics (ACL). https://doi.org/10.18653/v1/2022.semeval-1.209

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free