LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification

0Citations
Citations of this article
10Readers
Mendeley users who have this article in their library.

Abstract

The rise in mortality rates due to leukemia has fueled the swift expansion of publications concerning the disease. The increase in publications has dramatically affected the enhancement of biomedical literature, further complicating the manual extraction of pertinent material on leukemia. Text classification is an approach used to retrieve pertinent and top-notch information from the biomedical literature. This research suggests employing an LSTM-CNN hybrid model to tackle imbalanced data classification in a dataset of PubMed abstracts centred on leukemia. Random Undersampling and Random Oversampling techniques are merged to tackle the data imbalance problem. The classification model’s performance is improved by utilizing a pre-trained word embedding created explicitly for the biomedical domain, BioWordVec. Model evaluation indicates that hybrid resampling techniques with domain-specific pre-trained word embeddings can enhance model performance in classification tasks, achieving accuracy, precision, recall, and f1-score of 99.55%, 99%, 100%, and 99%, respectively. The results suggest that this research could be an alternative technique to help obtain information about leukemia.

Cite

CITATION STYLE

APA

Kurniasari, D., Warsono, Usman, M., Lumbanraja, F. R., & Wamiliana. (2024). LSTM-CNN Hybrid Model Performance Improvement with BioWordVec for Biomedical Report Big Data Classification. Science and Technology Indonesia, 9(2), 273–283. https://doi.org/10.26554/sti.2024.9.2.273-283

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free