Leveraging Small-BERT and Bio-BERT for Abbreviation Identification in Scientific Text

0Citations
Citations of this article
2Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Abbreviations are short forms of phrases that aid in the communication of long sentences in texts and are an essential part of the writing process. Abbreviations save a lot of time and space in writing scientific documents such as research articles, papers, clinical notes, etc. However, it is challenging to identify or map abbreviations to the complete form in scientific documents due to the vast and dynamic range of rules for forming an abbreviation. On the other hand, a massive increase in scientific papers over the Web has raised the need for an automatic abbreviation identification system by many folds. Thus, this paper proposes an LSTM-based deep learning system that encodes the target word and its context sentence using two different forms of pre-trained BERT embeddings (Small BERT and Bio BERT). The proposed system classifies whether the target word is an abbreviation or not. We experimented with two scientific datasets, viz., MeDal and SciAI, for the abbreviation detection task. We built abbreviation detection systems with two different settings, 1. having a lowercase module and 2. no explicit lowercase module. We observe that retaining the actual case of the letters in the abbreviation is crucial for abbreviation detection. Our system results in an F1-score of 90.04 % on the SciAI dataset and 85.68 % on the MeDal dataset for the abbreviation detection task. To observe the domain-specific behavior of the abbreviations, we also performed cross-domain evaluation (trained on MeDal, tested on SciAI, and vice versa). We obtained an F1-score of 76.50 % on SciAI data and 62.72 % on MeDal data in the cross-domain settings. We compared our system with six other statistical systems for the abbreviation detection task. Results show that our system is able to outperform other models by a significant margin.

Cite

CITATION STYLE

APA

Miglani, P., Vatsal, P., & Sharma, R. (2023). Leveraging Small-BERT and Bio-BERT for Abbreviation Identification in Scientific Text. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13913 LNCS, pp. 566–576). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-35320-8_43

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free